Hi guys, today I will show you
all about Stable Diffusion. It is a deep learning, text-to-image
model released in 2022 based on diffusion techniques. It is primarily used to generate
detailed images based on text descriptions. Many AI tools are not really usable in real work
yet, but Stable Diffusion is a different story. Guys from Vivid-Vision showed us
how they use it in their workflow during our studio tour, it was really inspiring. If you haven’t watched it yet,
the link is in the description. ### All the calculations are done by GPU. You need a computer with a **discrete Nvidia
video card** with at least 4 GB of VRAM. An integrated GPU will not work. Working with AI requires a lot of trial and error, so working with a good GPU will
speed up your process dramatically. Luckily, I was sent the NVIDIA GeForce RTX 4090
by the sponsor of this video - Nvidia Studio. Here are some benchmarks. More iterations per second mean faster results. As you can see this card is the top GPU right now. NVIDIA is currently the only
supplier of hardware for AI. Now it’s a great time to get started,
as the demand is very high and growing, because results are speaking for themselves. ### Next, let me show you how to install it. It’s not as easy as installing the standard
software, that’s why I’ve included this part. Also, I have created the blog post
with detailed explanations, links, and things to copy and paste, so here,
I will just quickly go through it. Don’t worry if it’s too fast, the link
to the blog is in the description. Go to the link provided in the blog
post and download the Windows installer. Make sure to download the exact version I’ve
linked to, as newer versions will not work. It’s important to check this
option, and then click Install. Download Git using the provided link. Install it with default settings. Next, we have to download
Stable Difusion Automatic1111. It’s not a downloading you are used to. First, we have to open the Command Prompt. If you want to install it in the
default location, that’s fine. If you want to choose a specific location,
navigate to the chosen folder, and type cmd here. Here you go. Now copy and paste the code from the blog post. Click enter. And that’s it, as you can see,
everything is downloaded here. Next, we have to download a
checkpoint model to this folder. I will explain what it is later on. Here is the website where you can download it, but I have provided a direct
download link in the blog post. This step is quite long, it
will take about 15 minutes. Run this file, it will
download and set up everything. Once it’s done, you will see the URL. Copy it and paste it into your browser. And here it is - Stable Diffusion
Automatic1111 interface. If you want to have it in dark
mode, add this to your URL. By default, it uses the browser mode. You can type something in the
prompt section to see if it works. Great. The last step is to modify the WebUI file. Copy the code from the blog. Right-click on the file and choose ‘edit’. It will be open in Notepad. Replace the content with
our code, save, and close. What it does, is it enables auto-update,
make it faster, and allows access to API. As all is done, let me show you how to open
Stable Diffusion next time you open your computer. If you just use the URL it will not work. First, you have to run the WebUI file. I like to create a shortcut and place
it on my desktop for quick access. Run the file and now the URL will work. It’s the same URL every time, so you
might want to bookmark it in your browser. ### We have a lot of types of models available. I will cover just CheckPoint Models which are the most popular and the ones you
really need, others are optional. Model CheckPoint files are pre-trained Stable Diffusion weights that can create a
general or specific type of image. The images a model can create are
based on the data it was trained on. For example, a model will not be able to create
a cat image if there are no cats in the training data. Similarly, if you only train a model
with cat images, it will only create cats. These files are heavy, usually between 2 and 7 GB. Here are 4 versions of an image
generated using exactly the same prompt. Each is using a different model. As you can see they are extremely different. The default model is not really
good, and you shouldn’t use it. Here we have a very realistic result. This one is a bit more moody. And finally, here we have
something totally different. I think this proves that choosing
the right model is essential. I have included a few links to popular websites
where you can download models in the blog posts. Here is the Reliberate model. You can see it’s based on Stable Diffusion 1.5 This website is great because you
can see examples generated with a model with prompts included which
is great for learning and testing. You can download it here, make
sure to place it in this folder, the same one we used in the installation, step 4. When you restart SD you can choose the model here. You can mix the models. I will show you an extreme scenario. Let’s use the realistic model
with this cartoony model. Here we can choose a multiplier, if we
choose 0.5, both models will work equally. We can also add a custom name. Click here to merge the models.
You can now choose the new model from the list. Let’s generate the same
images to see the difference. Here is the result. Basically, we have something in between. It’s a really useful feature. ### Let’s move to the interface. I will start with the prompts. Write it here and click
‘generate’ to create an image. Each time we get a different
result because the seed is set to -1, which generates a random seed each time. If we change it to 1 for example, each
time we will have the same result. We also have a negative prompt section. Whatever we write here will
not appear on the image. Let’s exclude grass. And a collar. You get an idea. ### ### Btw, this is real-time, I haven’t
sped this screen recording up. As you can see the images are generating
extremely quickly thanks to my new RTX 4090 card. Btw, VRay GPU is also crazy fast. You may remember the VRay GPU
tutorial I posted a while ago. If you want to know how to use it, you
will find the link in the description. Here are the render times for the 3 graphic cards
I used for tests + 2 render times using CPU. As you can see GPU rendering is way faster. NVIDIA Studio is not only hardware. Such great results are possible because NVIDIA
Studio is cooperating with Software developers like Autodesk or Chaos, for example,
to optimize and speed up the software. On top of that, we have NVIDIA Studio Driver available to download in
the GeForce Experience app. It’s more stable, which is super important. Recently I had issues with my video editing
software, which was crashing all the time. The studio driver fixed the issue immediately. ### Let’s go back to the interface. Here you can open the folder with all the
generated images, they are saved automatically. Also, there are text files
that contain the prompts and all the settings, which is super useful. This option is turned off by default,
I highly recommend enabling it. To do so, go to the settings,
scroll down, and check this option. Here are other options to save the
files or send them to other tabs. If we click this icon we can clear the prompts. And with this one, we can add
the last used prompt back. Then, let’s say you use some
part of the prompt very often. You might save it to styles. Give it a name. You can now reuse it by clicking this icon. Then, just add a specific prompt and generate. Next, let’s cover the sampling steps. Basically, it controls the quality of the image. More steps mean better quality. It doesn’t make sense to move the slider
to max, because the render time is longer. If we increase the steps by 15 - from 5
to 20, the difference in quality is huge. If we increase it by 15 again though,
the difference is not even noticeable. The sweet spot is usually between 20 and 40. It doesn’t make sense to go higher
as the difference will be tiny, but you will have to wait much longer. Next - the sampling method. We have a lot to choose from. It’s quite complicated and I haven’t really dived
deep into that, as it’s not necessary to know. I did a test and generated an image using the
same prompt and settings in each sampling method. From what I found most people use this method. Let’s change it and generate. I also think it gives a nicer result. Unfortunately, we cannot generate high res images. If we increase the size here,
we will get a messed up result. That’s because most models have a
max resolution of 512 or 768 pixels. Stable diffusion is generating 16 images in
this case and tries to stitch them together. With this model, we can do a max of 768 pixels. Let me show you how to create a larger image. We have to enable the ‘hires fix’. You keep the resolution at 512
and use the ‘upscale by’ option. If we use value 2 we will get
1024px, with the value 4 - 2048px. Denoising strength controls how similar
to the original the larger image is. The lower the value, the more similar the image. We also have to choose the upscaler. I recommend this one, you have to download
it though, the details are in the blog post. For now, let’s use this upscaler. First, let’s use a high denoising. You can see the image is way different. Let’s decrease the denoising. Now we have a very similar result. Next, let’s cover the batch count and size. If we increase the batch count
you can generate 8 images at once. They will be generated one after another. It’s great because you can
generate a lot of images, go grab a coffee and when you are
done you can choose the best result. If you like the image, you can check
the seed here to be able to recreate it. Here it is. Batch size does the same thing but
images are generated at the same time. In my case, the results are quicker. Lastly, let’s cover the CFG scale. The higher values will make your prompt more
important but will give you the worst quality. Lower values will give you better
quality, but results will be more random. I think the sweet spot is between 4 and 10. As we have the basics covered, now let’s
move to the fun stuff - Image to Image. We can start in Photoshop. Here I want to improve the 3D people, as
you can clearly see they are not real. As you remember the max resolution
we can generate is 768px. I will crop my image to this size and save it. In Stable Diffusion, let’s go to the image to
image tab, then choose the “inpaint” option. Drag and drop the image into the editor. Here we can turn on the brush and paint
over the areas we want to generate. Let’s write a prompt. I will set up a max size of 768 by 768. Also, I will set this option to mask only, this way the quality will be better as only
the mask area will have a size of 768px. Then it will be shrunk down to fit the image. Let’s generate the image. The result doesn’t match our scene
because the denoising value is quite high. Let’s lower it. I will generate more images,
then I will choose the best one. Let’s place the best image in Photoshop. It’s way better, isn’t it? Especially the hair and towels. I will mask the people to make
sure the connection is seamless. Then we can uncrop the image. Now we can have the best of both worlds, ease
of use of 3D people and realistic results. Let me show you another example with greenery. We can also place a large render
directly here without cropping. This is the image I did around 4-5
years ago, let’s improve on greenery. Let’s paint over this tree. I will add a prompt. I will increase the size also here. Btw, this image is 5K pixels horizontally. Keep in mind, that the
generated area is that small. If you paint a larger area
the quality will be lower. Let’s decrease the denoising. I will show you why you don’t want to
use the ‘whole picture’ option here. This way the whole image is 768 pixels. Let’s change it to mask only. Now we have 5K images, and
the generated part is 768px. I found out that the default
sampling method works better here. With this one, the tree is too
similar, there is almost no difference. I will change it back. Now the best part, we can clear the mask and drag
and drop the generated image into the inpaint. Let’s paint another area and generate again. You can repeat this process. Here is my result after spending
around 10 minutes on this image. I love it. The result is way more natural, soft, and photorealistic, while the
shapes are almost exactly the same. The difference between the foreground trees
is huge, the model wasn’t really good there. I hope you find this video useful and I save
you some time researching all of that stuff. If you want to learn all about architectural
visualizations in 3ds Max, check out my courses. Also, here I some videos on the same
topic you might find interesting. Bye, bye.