Explaining Image-2-Image In 15 Minutes – Stable Diffusion (Automatic 1111 | Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Thinking of removing the background music so people can listen to their own.

πŸ‘οΈŽ︎ 6 πŸ‘€οΈŽ︎ u/BitesizedGen πŸ“…οΈŽ︎ Jun 19 2023 πŸ—«︎ replies

Great work on this

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/Educational_Ice151 πŸ“…οΈŽ︎ Jun 19 2023 πŸ—«︎ replies

You using a vae? Outputs looked a little washed out and I didn't see the vae selector at the top of your UI.

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/VancityGaming πŸ“…οΈŽ︎ Jun 19 2023 πŸ—«︎ replies
Captions
image to image is one of the most powerful Tools in stable diffusion allowing you to generate images from both reference photos and prompts alongside image editing through in painting but does that all sound very confusing well don't worry because I'll be breaking down everything within image to image so you can spend less time reading and more time creating some of the concepts were covered in my text to image video so do check that out but like the video first and then I'll give it to you bite-sized so I'm going to start at the image section because this feature is the biggest difference between text to image and image to image and what this section allows you to do is upload a reference image which you will then use in combination with optional prompts to make new images we'll cover the sliders that give you more control over what influences the final image later but for now if we just drag and drop your desired image into the image section make the resolution portrait and set the denoising strength to 0.6 and then hit generate you can see that it tries to use the reference photo to generate something similar prompt in this instance is where you will enter the text that will guide how the reference photo you provided changes to match your desired final image it's where we provide additional context for stable diffusion to understand what we want by combining our text prompt with our visual image for example if I take our model and type in portrait shot beautiful woman Diamond Crown purple robe insanely detailed then stable diffusion will combine that prompt with our image to generate something that combines both negative prompt will tell stable diffusion what to avoid in our generated image and works with our reference photo to make something original this can be a hit or miss but it's good for improving Anatomy or quality issues while our prompt section can emphasize the things we actually want with different degrees of waiting for example if I use my previous prompt and Seed but add in worse quality low quality 1.4 then we will get a different result which will attempt to avoid a low quality image and this is reinforced by our insanely detailed prompt interrogate clip is a powerful tool which will take your given image and suggest a series of prompts that will attempt to replicate the image you've provided so let's say we're not sure how to describe this model well we can just click interrogate clip and this might take a long time to run for me it took about 10 minutes so go run some errands in the meantime and once the hamsters are finished running you will get a series of prompts which when used in practice should give us a similar result to the image we provided but of course this won't be an exact match all the time as other factors like the checkpoint you are using and whether the context of the image was understood will all factor into the final result deep buru is similar to interrogate clip but this will attempt to generate a series of prompts from an image based on Danbury tags Danbury is an image board for art mostly leaning towards anime gaming and some none safe for work material but this database for images is often used to train checkpoints and therefore stable diffusion can check it to see whether an image you provide can be broken down using the tags from that website you can see that it tries to break down the image into what it thinks the correct prompts are but as we're using a photo and Danbury is 2D this works better on fictional pictures and paintings copy image 2 provides us with a list of locations we can send our reference image to such as sketch in paint and even in paint sketch let's take a moment to look at these tabs as we have some very useful features which you will want to know about sketch will allow you to draw on your provided reference image and then use those adjustments to influence our generated image this may be useful for changing the colors of items or roughly adding in different items which you could then describe in your prompt as an example if I draw an apple into this photo we should see that Apple added in some variations depending on our denoising strength but if we now add Apple to our prompt we will get a much stronger and more consistent result being driven by both our image and then reinforced by our prompt in paint is one of the most powerful Tools in stable diffusion as this will allow you to mask a portion of your image and use your prompts to generate a new image in that specific mask location this can be useful for adding details to eyes fixing fingers or even replacing whole objects in an image without affecting the rest of the image something I need to make clear as I learned this the hard way is that your prompt needs to specify what you want in your mask section when in painting rather than the content of the entire image and I'll demonstrate why this is important by looking at the mask modes later on under in paint you will find some additional options unique to this section of stable diffusion so let's quickly cover these while we're here mask blur will apply a gaussian blur to the edges of the Mask allowing stable diffusion to blend the Mast image with the rest of the unmasked portions you can see in this image where I've prompted a clown nose that I've added a variety of blur values so we can see the impact from a sharp edition of the nose to a blurry fade that helps it blend in and at super high values the nose is barely visible at all in paint Mast will ensure that only the masked portion of the image is filled with the newly generated content for example if I type in Michael Jackson face we'll get his face within our Mast section in paint not masked will ensure that only the unmasked portions of your image are filled with the newly generated content so if I use the same prompt for Michael Jackson face but select in paint not masked then everything outside of our mask will be filled with whatever this is supposed to be fill is probably the option you're looking for when it comes to in painting and what this will do is fill the masked portion of a blurred version of your image with your desired prompt which leaves some room for stable diffusion to interpret the details you can see the difference between us using the full prompt for the image versus a specific prompt for what we want in their Mast area when using in paint original we'll fill in the Mast portion with content based on the original content of the section to be altered this is is another good option to use but the outputs will depend on your setup you can see the difference between us using the full prompt for the image versus a specific prompt for what we want in the masked area when using in paint latent noise will fill the mask space with random noise which is super confusing even for me and probably isn't something you would use unless you know it's what you want you can see the difference between us using the full prompt for the image versus a specific prompt for what we want in the masked area when using in paint and the results are in my opinion worse than either fill or original latent nothing will fill the mass space without noise and this is another sketchy option which despite its decent results you're most likely going to want fill or original you can see the difference between us using the full prompt for the image versus a specific prompt for what we want in the masked area when using in paint only masked padding pixel determines how much of the area around the Mast area should be used to generate the new image to help blend both the surrounding area and the image within the mask in paint area allows you to specify the resolution of content that goes into the Mast portions of your image this is because when you fill a mast area with content the resolution of the new content is based on the width and height you chose for the entire image by selecting whole picture you can ensure that the resolution for the masked area is in line with the rest of the image but by choosing only mask you can ensure that the resolution you choose only applies to that masked portion which can help you get more detailed results on low detail portions of your image in paint sketch gives you access to all of the features of in paint but with the added ability to include colors which can help stable diffusion understand an object much better by recognizing both the color shape and description of the Mask here we can see a few examples of this in action with different levels of denoising strength so you can see the impacts in action in paint upload allows you to use your own custom mask for your reference image which is more powerful than in paint as you get access to the same features but you can create a custom masks in an image editor that's much more accurate sharp and non-destructible in these examples you can see how our prompt is generating a different image within the confines of the white Mast area and this is great for adjusting specific areas of an image or an image as a whole batch is used for generating multiple images At Once by specifying an input directory where your reference photos are located and output directory which is where the generated images will go and you can even reference the location of a batch mask the control net input directory is a way to batch generate images with a control net mask by supplying the location of the Mask you want to use ensuring control net is enabled and selecting then the appropriate option that corresponds to the mask you are using here are some example images which demonstrate the results of both the masking and the normal map from control net also make sure related files have the same name so stable diffusion knows which files are related when running the batch just for size will ensure that your generated image fits within the specified resolution crop and resize will ensure that your generated image fits within the specified resolution even if it has to crop certain portions resize and fill will ensure that your entire generated image fits within the specified resolution even if it has to fill in empty space just resize latent upscale is the same as just resize but this doesn't use one of stable diffusion's upscale models and instead uses encoders and latent space which is all too complicated considering we just want to generate some art sampling method is an algorithm used for denoising an image or in plain terms stable diffusion correcting its mistakes by generating new samples of the image in steps and giving you a cleaner image the more steps you have there's a lot of sampling methods to go through and most of them are preference but DPM plus plus sde Caris Euler underscore a and ddim are popular choices sampling steps are the number of times the sampling method will produce an improved image and the higher this value the cleaner the final result and the lower the value the more noisy the final result restore faces can improve faces in generated images that have been distorted or messed up by stable diffusion it's controversial because it can produce worse results depending on the checkpoint you are using being better suited for realistic ones you may want to consider in painting as a more reliable method for fixing faces eyes fingers and other artifacts tiling is used to ensure that images can be tiled or in plain English to ensure that generated images can be placed next to one another seamlessly you can use this in image to image to turn a picture of a brick wall into a seamless brick wall texture and if you get strange artifacts when using this try it on a different checkpoint resize 2 lets us choose the size we want our generated image to be by specifying a width and height value and resize by lets us choose our resolution via a multiplier which multiplies the original size of the image by our specified number batch count determines the number of batches of images you will generate by multiplying the batch count by the batch size batch size determines the size of each batch or how many images we want to generate per batch count so a batch count of Two and a batch size of three will give you six really horrific images CFG scale and denoising strength are probably the two sliders you will need to focus on when trying to get the perfect image CFG scale will determine how closely your newly generated image is to your prompt or how much influence your prompt has over the final result the denoising strength determines how much influence the reference image you provided has over the final generated image by combining these two sliders together you can decide whether your final image is closer to your original or closer to your prompt the seed is the unique identifier for an image you generate based on the settings you use and using the same seed with the same settings will result in the same image if you change certain settings like the resolution then that will change the image you get with the same seed but it's a useful way to drive consistency in images you make once you decide on the type of settings you want on the topic of seeds this dice icon will set your seed to -1 which is a value that will generate a random seed each time you generate an image the Green Recycle option will reuse the seed from the last generated image which is super useful for when you want to generate a similar image to the one you just made and need to grab the seed without delving into the file's name there are also some extra Options under the seed which I'll explain variation seed attempts to combine the image generated from that seed with the currently selected seed for a blended image variation strength controls how strong the blending is with a low value giving you the image from your selected seed and a higher value mixing in more of the variation seed changing an image's size will change the variation of the image generated even if the seed is the same using resize seed from will allow you to use an image's height and width as the seed so you can maintain a similar composition to what you like even if you change the overall resolution of the image stable diffusion also allows you to utilize custom scripts and comes with a few installed right off the beat if you want me to explore this then let me know but one I recently started using is the XYZ plot which allows you to see the impact of various settings on your generated images and make the kind of comparisons I've been making in this video the generate button gives you [Music] this blue arrow icon will read and apply the prompts and settings from your last generated image providing the prompt box is clear of any text this can also apply the copy generation data from another image into the appropriate Fields the bin icon will delete both your prompts and negative prompts from the text box the red picture icon will show or hide your extratextual inversions hyper networks checkpoints and loras so you can easily access them within the same window think of these as add-ons which you can install and then use from this window the floppy disk icon allows you to save your prompts to a style section which can then be loaded later on the notepad icon allows you to load prompts from your saved Styles and apply them to both the prompt and negative prompt sections the blue rotation icon will refresh the styles list and update the ones that are missing the folder icon will open the file directory where the generated image is saved and this saved location will differ depending on whether you're using text to image image to image Etc the save button will save both your generated image and a CSV file containing the generation data so you can quickly have a record of your image and its Associated details archived for future use the zip button will do the same thing as save except your images will also be stored in a zip file and you will of course get a CSV file containing the generation data center image to image in paint or extras will take your selected generated image and sent it to either of those tabs for further editing you can also manually drag and drop the image into those tabs but this is just a quicker option for sending the image directly for processing but I hope you found that helpful and if I've missed anything then please do let me know and I'll look to improve the video in future also consider subscribing to the channel and patreon in the description and of course leave a like this is bite size genius and I hope you enjoyed [Music]
Info
Channel: Bitesized Genius
Views: 12,725
Rating: undefined out of 5
Keywords: Stable Diffusion, stable diffusion prompt guide, stable diffusion controlnet, stable diffusion prompts, BitesizedGenius, Automatic1111, stable diffusion lora, stable diffusion extensions, stable diffusion embeddings, stable diffusion checkpoints, stable diffusion anime, stable diffusion scripts, stable diffusion video, stable diffusion img2img, Stable diffusion realistic, stable diffusion models, stable diffusion install, Stable diffusion tutorial install
Id: ltTjpW0t2BI
Channel Id: undefined
Length: 15min 14sec (914 seconds)
Published: Sun Jun 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.