From sketch to 2D to 3D in real time! - Stable Diffusion Experimental

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Eye of a newt, wing of a bat, and...  a rabbit's foot. Oh! Hello there, and   welcome to another Stable Diffusion Experimental  video. Today we'll be looking at a fun new way   to create 2D and 3D game-ready, "game-ready"  assets, just by starting from raw sketches,   all in real time, inside of Comfy UI. And all of  that, without any external tool needed. Alright,   so here's the workflow that we're gonna use  today. The core of it is Stable Diffusion Excel   Lightning. And by using as the Excel Lightning, we  can run things in almost real time. And actually,   if we wanna stick to 2D assets, we'll run  basically in real time. But the two new nodes   that I'm really excited about trying today, and  that make all of this not really production-ready,   but more on the experimental side, painted node  on the left side here, will allow us to roughly   sketch whatever we need to then create and  generate in 2D and 3D. And on the other side,   on your right-hand side, TripoSR will take care of  transforming a 2D image into a 3D mesh. And also   left here, disabled by default, in the bottom left  corner, a Photoshop to Comfy UI group. So that   in case you wanted to use Photoshop to actually  draw your illustrations, you can do that as well,   by turning that on. So there's a switch here, that  switches from the painted node to Photoshop. Since   I've already covered this two times already, I  won't cover today how to set up Photoshop and   Stable Diffusion, but you can find two videos in  the description below, and over here as well. So   if you don't know how to do that, and you still  wanna use Photoshop instead of the painted node,   check those videos out before attempting to do  this. As always, you'll be finding the JSON for   the workflow in the description below, as well  as any other models we might need today. And   I will guide you through the creation of this  workflow in just a second. On the other hand,   if you don't wanna be bothered by how I construct  the workflows, and why I use a node instead of   another, you can just jump ahead and check out  the results. Alright, so we're gonna need two   things in order to work with SDXL lightning  models. And I already covered this last week,   but I'll cover that again just in case. We're  gonna use a checkpoint. Last time it was real vis,   but since today we're not going for realistic,  but we're going for an illustration kind of look,   we're gonna go with Juggernaut Excel, lightning  version. We're gonna download that, and place it   in our model checkpoint folder inside of comfyUI.  And then we're gonna need an SDXL Lightning LoRA.   There's different LoRAs, each trained on a  different number of steps. Like last time,   I'm gonna use the four-step LoRA. So I'm  gonna download the SDXL Lightning four-step   LoRa dot safetensor, and place it into our models  LoRA folder, instead of comfyUI. As always,   you will find a link to these models in the  description below. So now that we've got all   the models that we need, the first thing we're  gonna need is a checkpoint. So we're gonna load a   checkpoint. Search for load checkpoint, and then  we're gonna select our SDXL Lightning model. In   this case, Juggernaut XL Lightning. The next thing  we need is a LoRA node. So search for Load LoRA,   place it over here, and hook both the model and  the CLIP through the load LoRA node. After that,   we're gonna select our SDXL Lightning four-step  LoRA dot safetensor. I have already explained   this in the previous video, but what this does is  basically allows our checkpoint here to be solved   in just four steps. In order to do so, it has to  go through a four-step LoRA. The next thing we   need is gonna be a KSampler. So drag and drop from  model, select KSampler, and then we're gonna need   two prompt fields, one positive and one negative.  So drag from clip, and then drop and select clip   text and code. Control C, control V, and then  connect the clip. The top one is gonna be our   positive prompt field, and the bottom one is gonna  be our negative. So connect the conditioning from   the top one into positive, and the one from the  bottom one into negative inside of the KSampler.   For now, don't worry about the prompt fields,  we're gonna take care of that later. And for   this workflow, we don't wanna do text to image, we  wanna do image to image. So, what we wanna do is   encode an image. So we'll search for a VAE encode  node, drag and drop the VAE from our checkpoint,   and drag and drop the latent into the latent image  input inside of the case sampler. The next thing   we're gonna need is a VAE decode node, so drag  and drop from latent into the case sampler. And   select VAE decode. Drag and drop again the VAE  from the checkpoint, into the VAE decode node,   then drag and drop from image here in the VAE  decode node, and select preview image. This   image will be our 2D game asset. Now let's take  care of the case sampler before moving on to the   other parts of this workflow. Since this workflow  is using an SDXL Lightning model, we're kind of   limited to what we can actually input in the  case sampler. The steps must be equal to the   LoRA. So it's gonna be four steps. The CFG needs  to be between one and two, so we're gonna go with,   let's say, 1.3. The sampler is gonna be DPMPP 2M  SDE, and the scheduler is gonna be SGM Uniform.   It is very important that SGM Uniform is selected  here, otherwise SDXL Lightning won't work in the   way it's supposed to. And by the way, if you're  not sure about what you should input over here,   you can just go over to the model page, and  usually there are some suggested settings. And for   Denoise, since we're an image-to-image workflow,  we can lower that a bit. Let's say 0.75 for now,   and then we're gonna fix that later if needed.  Now you can see that here in the VAE encode node,   we are missing something, and that's pixels. That  means an image. But where does the image come from   in a workflow? Well, it really depends, because  in this workflow, I'd rather use the painter   node. So we're gonna search for our painter node,  and we're gonna place it right over here. Now,   in case you can't find a painter node inside of a  comfyUI, or you can't install it via the manager,   let me show you how you can install it. So, in  order to install the painter node, if you can't   do that via the manager, you wanna go to this  GitHub page. Inside of this GitHub repository,   you can find the painter node. What we wanna do  is scroll down to the install section, and follow   the procedures. So in our case, we can either  download the contents of the GitHub repository,   or we can install using Git. I would suggest to  use Git, if you already have it installed. You   can just go to your custom nodes folder inside  of comfyUI, open the terminal inside the folder,   and then type git clone, and then followed by  the link to the repository. So what we wanna do   is copy this command here, git clone, git clone,  and then the link, and then go over to our custom   nodes folder inside of comfyUI, right click, open  in terminal, and then, Ctrl V, git clone, link,   and press enter. In my case, the destination  already exists, because I already copied that,   so it won't do anything, but in your case, it's  gonna install the missing nodes. And then, you're   just gonna have to restart comfyUI, and refresh  the webpage, and then you'll see the nodes that   were missing before. Back to our workflow, we got  our painter node here, and the next thing we wanna   do, is resize the image that's coming out of the  painter node, to an image that is good for SDXL   Lightning, and that's 1024 by 1024. So the thing  we're gonna do, is just drag and drop from image,   and then search for image resize. Gonna input  1024 by 1024, and then hook up the image output,   to the pixels input, in the VAE encode node.  In case you don't wanna use the painter node,   but you'd rather use Photoshop, you can just  download the workflow in the description,   and use that instead. Inside of that workflow, you  will find both the painter node, and the Photoshop   to ComfyUI node. That workflow has a logic switch,  inside of the Photoshop to ComfyUI group. So,   if you wanna enable the Photoshop to ComfyUI  group, the switch is gonna activate as well,   and then the painter node is not gonna be  activated anymore, and the VAE encode node,   is gonna work with Photoshop only. The opposite  is true, while the Photoshop to ComfyUI group is   disabled, then only the painter node, will be  acting on the VAE encode node. Another thing   that we wanna add, that is gonna be useful later  as well, is a remove background node, so that we   can just use the resulting image as an alpha. So,  we're gonna search for a REM - remove - BG session   node. We could use a variety of different nodes,  like for example, the segment anything node,   that we used last week, but I wanna show you a new  node. But this node is very versatile, because it   can work by either using the CPU only, or by using  CUDA, or direct ML, or openvino as well, and a lot   more of different options. In my case, since I'm  using an NVIDIA GPU, I'm gonna select CUDA. Then,   we're gonna drag and drop from REM BG session,  and then select image remove background plus.   This node needs an image, and the image we want,  is the image that we're generating right now. So,   we're gonna drag and drop from the VAE encode  image output, and input that into the image   remove background node. Then, from its outputs,  we're gonna drag and drop from image, and select a   preview image node, that's gonna be our segmented  image, without the background, and we're gonna   drag and drop from mask as well, and select our  mask to image node, and drag and drop from over   here as well, and select a preview image node. By  doing this, we can both have the resulting image   without background, and the resulting mask. So,  in case we wanna troubleshoot something, we can   do that at a quick glance. And now, it's time  to think about what we actually need as a game   asset. So, let's say we want a game illustration  of a mana potion. So, we're gonna drag and drop   from image, and select a preview image node, and  select a preview image node. And for negative,   since we don't want any photography, for example,  or we don't want any 3D or render, we're gonna   input just that. And since we're on YouTube, and  rather are more on the side of caution, I'm just   gonna add not safe for work, and mute. Never seen  a nude mana potion, but you never know. And now,   we can finally draw. The painter node works  basically like paint, and here, you have a brush   tool, here, you have an eraser, what I'm gonna do  is just work freehand, but you can use different   tools and shapes if you want to. So, I'm gonna  go over here, select a color, let's say we wanna   select, let's say white for starters, and then,  I'm gonna draw the basic shape of a potion, so,   like a glass vial. And then, I'm gonna select the  brown, and pop a corkscrew over there, and then,   I'm gonna select a light blue, since everybody  knows mana is blue, health is red, mana is blue,   and then, I'm gonna just fill it in inside of  the glass. And then, I can hit Queue prompt,   and generate the picture. As you can see, it's  very fast. But the thing we wanna do is, we wanna   keep generating while we're drawing inside of the  painter node. So, in order to do that, we wanna   do two things. The first thing we wanna do is go  over to our ComfyUI sidebar, enable extra options,   and then, hit auto-queue. What this does, it's  basically auto-queuing the Queue prompt command.   And then, since we're using a preview image node,  and not save image node, we wanna keep the image   around until we make any substantial changes. So,  we're gonna input fixed, instead of randomized,   inside of the KSampler. Now, if you wanna switch  a preview image for a save image node, you can   absolutely do that. I just don't wanna keep around  a lot of images that I might not even need. So,   instead of save image node, I just use and  prefer a preview image node. Also, if you made   some changes to an image that was in the preview  image node, and you can't find it anymore, it's in   your temp folder inside of your ComfyUI folder. At  least, up until the point you restart ComfyUI, at   that point, it would be deleted. But up until that  point, the temp folder inside of Comfy UI will   hold everything you generate in a preview image  node. Okay, so now that we got everything we need   for a 2D image generation process, we're gonna hit  Queue prompt, and as you can see, the prompt will   continue to auto-generate. And now that the prompt  is on our Queue, we can just keep adding new stuff   to the painter node. And so, I'm gonna draw like  a bit of a table here. Now, it's thinking it's a   ring, so what I'm gonna do is just add a bit more.  It might interpret this as a table, or as sand,   or rock, we might never know. Here, it thought  it was like a green splash, so I'm gonna go with,   let's say, a darker brown. And let's see what it  does. As you can see, it just keeps generating   stuff, so now it's like rock. And honestly,  they're quite good illustration that are kind   of ready to use for 2D games or point and click  games. Like, for background assets or clickable   assets, they're pretty good. And the generations  are almost instant as well. Let's add, let's say,   a bit of a violet sparkle around. And as you can  see, it's lighting like this fairy lights around   the bottle. Let's say we want to add a little bit  of a green splash inside of the bottle. Let's add   some green as well. And there you go, you got  a slight green hue. Or let's do like a poison   minor bottle, and it's got some green fumes  coming out of it. But what if we wanted to   add 3D capabilities as well and generate meshes in  almost real time? Well, what we're gonna need is a   ComfyUI TripoSR node. You'll be able to find the  link to the GitHub in the description below. And   in order to install this repository, we're gonna  do the same thing we did before for the Painter   node. So we're gonna copy this command, then go  over to our custom nodes folder inside of ComfyUI,   right click, open in terminal, copy the command,  git clone the link to the repository, and press   Enter. Same as before, I already installed it,  so I got an error, but in your case, it's gonna   be installed inside of your custom node folders.  Now, in this case, we also need to install some   dependencies. So, we're gonna go over to our  TripoSR folder, right click, open in terminal,   and then type pip install -r requirements.txt  and then press Enter. What this does is basically   installing everything that's needed in order  to make the node work. In my case, since I   had already installed it, you can see that the  requirements were already satisfied. In your case,   it's gonna download everything that you don't have  that is gonna be needed by the node. So, now that   we have the node, you're gonna need to restart  ComfyUI and then refresh the webpage. After you've   done that, I'm gonna stop the auto queue because  I don't wanna keep queuing anymore until I'm done   building the workflow. And then, we're gonna  search for TripoSR. And as you can see, there's   three nodes here. We're gonna need all of them.  So, we're gonna start with the model loader. Now,   the model loader is gonna need a model. In order  to find a model, you can find it linked here in   the description below or on the GitHub. This on  the GitHub is the link to the model we're gonna   need. So, we're gonna click on that and then we're  gonna be on Hugging Face. We're gonna go over to   files and version. We're gonna select model.ckpt,  download that and place it in our checkpoint   folder or our comfyUI folder. Once the model has  been downloaded, we're gonna refresh comfyUI and   then we're gonna find it over here. In my case,  it's gonna be model.ckpt, but if I were you,   I'd rather change the name of it because if I  don't, at the end of the day, I'm gonna have   a lot of models.ckpt. So, rename it something  like TripoSR.ckpt and that's gonna do the trick   and you're gonna remember what it does when the  need arises. Then, I'm gonna drag from TripoSR   model output and select TripoSR sampler. Now, the  sampler needs a few things. It needs a reference   image, a reference mask, a geometry resolution  and a threshold and that's where our image remove   background node is gonna come in handy. We already  have both of those. The reference image is either   the one without the background or the original one  from the VAE decode node. At the end of the day,   it's gonna be the same because we're gonna fit  it the mask from the remove background node as   well. So, I'm gonna drag and drop from the VAE  decode node into our reference image input and   then from the mask of the image remove background  node inside of reference mask of Trepo SR sampler   node. Then, for our geometry. Geometry is  how complex is the resulting mesh. Now,   we have to make a choice between good resolution  and fast generations. Since I wanna generate fast,   in real time and then decide at a later stage if  I like the results or not, I'm gonna input a very   low resolution like 128. But, once I will find  a result that I will like, then I will increase   the resolution to let's say 512 or 640 which is  the maximum resolution I can reach on my 3080   Ti and then generate the same mesh but at a better  resolution because the mesh is gonna stay the same   because the SID is fixed in my case sampler.  If you don't care about working in real time,   you can just input 640 or 512 or whatever is  the maximum resolution you can output and then   queue every single prompt once you've made changes  that you like. As for threshold, it really depends   on the generated images so for now, I'm gonna  just keep 25. Then, we're gonna drag and drop   from mesh and select TripoSR viewer and since this  is a lot easier to show than actually explain, I'm   just gonna hit queue prompt and there we go. This  is our glorious result in 120p. As you can see,   it's a 3D approximation of this object we have in  our illustration based on one single image and as   of right now, both the mesh and the textures are  not great quality but what we can do is just go   over to our geometry resolution field, increase  it to the maximum we can actually manage and   then queue prompt again. And as we can see, we  can see that we can get a much better mesh now.   I had to turn down the geometry resolution to  512 because while recording this, it couldn't   handle 640 so you could probably squeeze out a bit  more resolution both for mesh and texture. Now,   is this good enough to be considered game ready?  Oh no. Oh no no no, absolutely not. But yeah,   I've seen way worse assets in indie games out  there and at least you've got a nice mesh you can   work on in say Blender or whatever software you  might be using for 3D stuff. The topology is not   gonna be great so you're gonna have to retopo that  and the texture, you can work on that as well. So   as a starting point or a very cheap game asset,  that can work quite well. But we might still not   be done with generating assets so let's turn down  the resolution to 128 and then hit auto queue here   and start anew. Now you're gonna see that both  the 2D assets and the 3D assets will be able to   be generated in close to real time. It's gonna  be a bit slower than normal here because I am   recording my screen and I'm using my camera but  in my testing it's quite fast. It usually takes   around four or five seconds with my 3080 Ti so  that's quite good. So we're gonna clear our canvas   and then we're gonna change our positive prompt to  let's say a poison vial. So we're gonna go back to   our painter node now and we're gonna hit queue  prompt and I'm gonna draw a vial. So here we go.   We've got the vial. We've got the grin for poison.  Everybody knows that poison is green. Then we're   gonna add like a corkscrew over here and then  we're gonna add some toxic fumes around it and   as you can see we're getting the asset in close  to real time although it's quite scuffed but we   can keep improving in our painter node or we can  just change the subject. So let's say we want to   do a cauldron. We're gonna go for like a sort of  a witch team. So we're gonna clear our canvas,   select a grey of sorts and then start drawing  and then we're gonna put like a skull here and   then we're gonna add some liquid inside. It's  not picking up the skull so we are going to   add with a skull in front of it. Yep, and as  we can see now it's picking up that this one   is a skull. And there we go. We've got our asset  both in 3D and in 2D. Both with a background and   without. So let's say I wanted to develop a  point and click game or I need some assets   for my backgrounds or I need some assets for my  D&D games or any other game I might play or any   other thing really that you might need simple  illustration in 2D or 3D meshes for. Done. You   can just instantly create and generate with this  real time workflow. Now this is a very linear and   simple workflow. If you wanted to add stuff to it  you could absolutely do that. Let's say you wanted   to add an IPAdapter group in order to source  from reference images that you have and that   you like and it can influence the final designs  of your illustrations. You can do that as well.   Just implement a group for IPAdapter over here. So  searching for IPAdapter and we're going to select   the V2 so that's "Advanced". We're going to search  for our unified loader and then I'm going to hook   the model through this so we can create a unified  loader the IPAdapter and onto the case sampler   and then load a reference image so search for a  load image node and plug it in into the IPAdapter   advanced node. Or let's say you wanted to generate  different assets and then merge them all together   before creating a 3D mesh. You can absolutely do  that and then use an image blend by mask node like   we did last week. The possibilities are actually  endless same as all the workflows I'm showing   you all in all though I wouldn't think that this  workflow is quite production ready yet. TripoSR is   not quite at a place where I would feel confident  assuring clients that I could do something with it   at least not on a consistent level and that's what  I'm looking for in my production ready tutorials.   So that's why I'm presenting this tutorial to  you in the stable diffusion experimental series   instead of the professional creative series. So  that's going to be all for today and I hope that   you like this workflow and can implement it in  your own ways. As always this is a new channel   so any like and subscribe I get helps a lot. I  am Andrea Baioni you can find me on Instagram   @risunobushi or on the web at andreabaioni.com.  If you want to know more about how you can   experiment with stable diffusion or you want  to use stable diffusion for your professional   creative needs check out these videos. Thank you  for watching and I'll be seeing you next week.
Info
Channel: Andrea Baioni
Views: 4,183
Rating: undefined out of 5
Keywords: generative ai, stable diffusion, comfyui, civitai, text2image, txt2img, img2img, image2image, image generation, artificial intelligence, ai, generative artificial intelligence, sd, tutorial, risunobushi, risunobushi_ai, risunobushi ai, stable diffusion for professional creatives, professional creatives, install comfyui, installing comfyui, comfy-ui, andrea baioni, From sketch to 2D to 3D in real time! - Stable Diffusion Experimental, stable diffusion experimental, triposr, sdxl, sdxl lightning
Id: DO8CeE5c0Mc
Channel Id: undefined
Length: 24min 59sec (1499 seconds)
Published: Sat Apr 06 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.