Mastering Stable Diffusion: Crafting Perfect Prompts for Automatic 1111

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey everyone welcome back to Alchemy with zero phase this is Eric and in this video I wanted to take some time to discuss how I do prompting and how I structure my prompts when it comes to stable diffusion in automatic 11 um I've been working with chat GPT for a long time I've been working with stable diffusion for a long time each and there's other you know other AI programs you whether it's Bard or Claude AI or whatever each one has almost a different way of understanding the prompts that you give it but our Focus today will be on how to prompt in stable diffusion so I know there's a lot of people out there who are very confused or just not sure what makes a good prompt and I'm not saying that the way I'm going to show you here is the only way but it works for me and it's the pattern that I use to develop my prompt generator as well and it allows me to create some very good uh images so what I want to start off with a lot of you um are just using uh very basic descriptions to generate your prompts you know let's say we put in a a beautiful woman sitting in a nice restaurant with candles okay we're going to be using Juggernaut XL uh version five uh to do these um we are going to use a negative prompt because we want to at least give it a chance of being a really good image okay um I'm not going to we're going to scale the negative prompt down a little bit though uh my negative prompts tend to be a little heavy-handed so uh and that's on purpose for various other reasons but in this instance we're going to bring the negative prompt overall weight down to3 using the plugin negative prompt weight or npw if you want to search for that and let's go ahead and change this down to a 54 ratio as well uh I want it to be slightly wide format but have enough height to fit you know stuff in the picture we're going to leave everything else the same I'm not even going to touch a detail or anything hit generate on that give it a second to spool up here and throw that image out okay so you can already see that we're getting an image of what appears to be a beautiful woman there's lots of candles uh all over the place uh she seems to be sitting at a table here there's a plate some cups here give it a second to fin finalize the image here not sure why but it seems like ever since the 1.6.0 update maybe even with the sdxl models that last Little Couple percents I'm not sure exactly what it's doing there so let's bring this image up okay here we have an image uh fingers turned out okay they're a little weird but we got candles everywhere so the AI interpreted that prompt as best it could now you know I'm not saying that you couldn't go with this a lot of people might take this and then they' try to in paint a bunch of stuff and My Philosophy is if you can get it right the first time get it right the first time okay uh we told it we wanted a beautiful woman sitting in a nice restaurant with candles we didn't give it any kind of an art medium we didn't give it any kind of uh we said with candles so it's assuming probably soft lighting we get some chandeliers over here we got lights all over the place so the AI didn't get enough guidance not enough context so let me show you how I structure my prompts okay I'm going to paste this in here so this is not an actual prompt okay so first thing we focus on is we declare the art medium and possible styling that we want with the image okay art medium could be anything it could be uh watercolor it could be photography it could be anything any style you want this gives the AI the strongest impression of what kind of artistic medium you want it to generate in uh a lot of people put the artistic medium at the end of The Prompt and and I don't feel that that gives it a strong enough impression so you end up getting like let's say you specify charcoal uh image or charcoal uh drawing at the end of The Prompt and but you end up getting something that just looks like a digital image you know it's because as the prompt gets longer the AI pays less and less attention to the things further into the prompt and so what people end up doing is using the uh Focus formatting or I that's what I call it so putting parentheses around stuff with uh uh numbers that helps it amplify stuff so I usually declare the art medium at the beginning gives the AI the best possible chance of making sure the image is in that art medium and the second thing I do is I give it what I want it to primarily focus on okay and the details around that so in this particular instance uh you know let's say I wanted this to be a photograph so what I would say maybe professional photography of and then I would and then uh put in like a beautiful woman in a in this instance a white uh night gown um let's see wearing a white night gown okay and then we'd specify some other details like maybe long wavy hair uh giving details around the primary subject okay cuz you want that to be the focus of the image okay and then what you do is you do secondary focus and details and this could be maybe another person or background people in the background it could be um what's on the table maybe because that is kind of part of the focus maybe there's food on the plate I don't know so you want to put in secondary focus and details okay then you can throw in details about the background um um the restaurant uh I I we specified restaurant at the beginning but at this point we can now start specifying things like tables patrons uh waiters even if you wanted to um details about the environment you know we could say candle lit okay um then we move into details about I I call it the production and lighting details okay this is where you're going to use terms or even um equipment details okay when you're specifying photography it's always good to include the details around a specific camera that could have taken this image okay most of these AIS the images that were trained on they were trained on the metadata as well and a lot of that metadata includes camera information and when you associate the camera your image is going to look a lot better it's going to structure it better it's going to look balanced better this this image does not look balanced I don't know what's going on back here uh things on the table just look kind of squirely you know somebody who's just kiding this maybe they would love this it would look great but um I've been doing this for a while and so I can see all the different flaws even her eyes are just kind of her face looks very not kind of artificial okay so we want to specify um details about what was used to produce the image possibly or uh we could even specify C uh how do I put this um information that a camera might need like we could say high dynamic range sharp details natural colors or deep Vivid colors you know that kind of thing okay so what I'm going to do now I'm going to bring over a prompt that uh I generated using my prompt generator we're going to go through this I'm going to put it below this one so we can kind of go through it together on this okay it's not a long prompt so in this I specified it or so my request to The Prompt generator uh was a little different so I SP I wanted two prompts just so I get a good one professional photography have a beautiful woman dressed in an evening gown sitting in a high-end restaurant with a candle with candle lighting surrounded by other tables and people soft lighting dim but Vivid deep colors you have to understand the way my prompt generator works I don't put commas and anything I'm relying on the ai's knowledge and its ability to parse out a sense and understand it okay and it gave me two prompts uh that were pretty well uh structured uh I went with this one here and I did change a couple things in it when I finally got it into the uh um the program here the only thing that I changed was I changed it had glistening jewelry near the end and where was it Exquisite makeup uh somewhere in the middle and I tried to the way I've got this promt generator structure is it's supposed to keep the details of individual subjects or subject matter together most the time it does good this time those two were kind of out of place and I just mooved them not a big deal okay so what I'm declaring here I want professional photography okay and then I'm specifying and emphasizing okay we're using the emphasis format here uh beautiful Scottish woman uh with a 1.3 opulent evening gown and in this here we could actually specify the color too I might want to specify uh um let's say and here's the thing with colors that I found works really well if you give it a descriptive term beforehand so let's say ruby red evening gown okay uh with glistening jewelry Exquisite makeup my prompt gener tends to emphasize certain characteristics to make sure they get included uh it is specifying soft candle light here um honestly you know I'd rather it be down near the end which is okay right here though elegant Diner so it's now going into kind of the surrounding details dim but inviting Vivid color palette now you'll notice that um it uses the word break here now normally it would have used break up here too depending on the prompt but uh what this does this is actually a function that's built into stable or automatic 1111 and how it interprets prompts if your prompt goes above I don't know something like 79 or 75 tokens or something like that it automatically includes a break as it renders it to help it refocus on the rest of if your prompt is longer than that so I tend to include it because um it helps the AI refocus on the rest of the prompt um bringing out those features okay uh we're saying mysterious ambients glowing candles cheek Sur uh chick Chic what Chic there we go Chic surroundings and another break and and again it'll help it focus on those and then we're telling it we're capturing it with this camera okay again that would be metadata that was trained on then we're saying High dynamic range sharp details natural colors now I want you to take a really good look at this in fact I'm going to see if we can't uh we'll go back to it I'll pull it up here a little later we'll look at them side by side once we do this and uh the negative prompt we've got it toned down I'm going to leave it the same okay we'll leave everything else the same same rerun this just so you can kind of we we'll see I'll show you the uh comparison difference here sure what it's doing with her arm there that's okay all right let it finish that up here let me go see if I can pull up the other image here okay I made a huge mistake here I'm like looking at this image something's not right here I totally forgot to get rid of the uh the guide prompt up here where I show you how how I structure this okay so let's let's rerender that one more time um I got the other image pulled up here and so we'll be able to look at both these side by side here once we get this up let that finish up here okay it's done rendering let me pull these over here so you can see both of them see if we can get that fit [Music] here okay so as we look at these what we're seeing here is we have um The Prompt the new prompt introduced a lot of other terms uh dealing with the restaurant itself the Elegance the lighting and everything um with this one here we're getting a kind of an array of things like this table back here had a white tablecloth but hers is just a bare wood table she looks like she's got orange juice uh there's no jewelry on her obviously we didn't specify that okay this one here is a lot more structured when it comes to the interior of the restaurant as you can tell there's a table here you can tell there's a table way back in the background we have structured lighting now for whether it's chandeliers or from um uh wall hanging candle it there's another table right here it did mess up a few things uh you can see the wine glass here has a a candle flame in it but it did include the uh jewelry the the dress color and all of that you have a lot of control over this um on what you can do or not do inside of a prompt but and and you can have really long prompts okay here's the thing you know if you make long prompts you might want to look at using the break command to uh help it focus on a lot of the important aspects of it use the focus uh the focus um formatting Um this can be used in a lot of different ways uh to help it focus on specific things I think you can use it within itself too like you could actually take that word and do Focus formatting on it with a different number Al together um a lot of this is just going to be experimentation uh if we wanted to add more detail to this uh maybe expand the scene so that's another thing that I would love for people to understand you know a lot of people get frustrated when they type in uh I don't know a prompt for that just says uh show me a or a a picture a beautiful woman and it's not centered it's cut off at the wrong area they can't figure out how to get it centered um usually using terms like professional portrait photography or just introducing portrait means that the character will be centered okay uh but if that's all you do that's all you're going to get in the image but what happens if you start detailing the surroundings and emphasizing various aspects it's almost like telling the camera you need to pan back you need to uh pull back so you can see what it is I'm describing okay um let's see if we can do this uh add more let's do five more details about the restant okay so something I realized here I modified The Prompt that I originally generated and so what I want to actually do is grab that modified one and we're going to bring it back over here I'm going to show you just basically I want to add some details regarding the um restaurant so we're going to do this I want you to take the following prompt and add five more specific physical details regarding the restaurant just give it a second to kick in okay there we go it extended The Prompt added more details let's see what it did mysterious glowing candles cheek surroundings Fine China polished silverware Lush floral centerpiece velvet drapery mahogany dining table okay yeah great so let's grab that excuse me paste that back in here let's render that out okay so that added definitely some more detail we got some floral in there uh the dark mahogany wood uh Fine China looks like Fine China it's got the gold rim around it so overall it add uh aot a little more aesthetic I'd like it to pull the camera uh view back a little bit so let's see if we can have it uh add some details regarding the uh some people in the restaurant I I'd like to see more people in the image let's see what that does so let's see what it did high in restaurant Soft C light beautiful Scottish woman stylish men in tuxedos ah there we go okay so let's grab that again I know this is all about structure I'm just having fun with it now at this point I've kind of shown you the basic structure of of what works for me in a prompt wow clearing my throat again sorry about that people yeah no I was a little afraid of that it's interesting when you do pull it back and you're talking about you know restaurants and stuff like that it it has a hard time adding the people in could a lot of it could be because I'm emphasizing uh other things and uh the people don't just don't come out another thing that uh to keep in mind is it could also be the uh the aspect ratio I'm using as well so that actually helps when doing things whether it's portrait or other things like that so we're going to do 9 by 16 let's see what that does I know sometimes you want to render rerender to get uh to get what you want uh sometimes well I like to render things one time I don't to sit there and try to keep rendering stuff I know sdxl has an issue with um multiple people when you're trying to describe multiple specific people yeah sorry I've got my little girl with me and she's clearing her throat too you you bitter okay um so typically in a in a situation like this I know I'm getting a little off course here you'd just say uh a group of people or people at a restaurant uh sdxl works better when you kind of generalize the term like group of people large Gathering of people um things like that it'll it knows those terms better but when you're trying to say that you describe a woman then you try to describe a man it for some reason just wants to just display the woman in it or you know only one person or the other I'm going to try flipping this one more time we're going to increase the config scale to see if we can get it to uh go outside it's normal thought process here the config scale um is something that you really should play around with uh it can drastically change an image and give you completely different results based on you know if you're going between like 5.5 and 8.5 sometimes I go as high as nine but it really depends on what I'm doing yeah that problem's not working I'd have to rearrange it anyway you get the idea I appreciate you guys listening and uh um hope you don't mind if I have my daughter with me sometimes she's absolutely adorable and uh love seeing what Daddy's doing but um like And subscribe and uh let me know in the comments if you have any questions I love answering questions join our Discord I'll put a link up on this video if you want to join the Discord and see what we're doing if you have uh deeper questions that require something more than just a comment section we'll talk to you later
Info
Channel: AIchemy with Xerophayze
Views: 2,427
Rating: undefined out of 5
Keywords: Stable Diffusion, Automatic 1111, AI Art, Prompt Structure, Creative Prompts, Artistic AI, Craft Perfect Prompts, AI Artistry, Stable Diffusion Tips, AI Art Tutorials, Creative Process, AI Art Techniques, AI Assistance, Artistic Expression, Digital Art, AI Prompts, Xerophayze, Creative AI, Art Generation, Digital Creativity
Id: svavYdmnCzk
Channel Id: undefined
Length: 21min 34sec (1294 seconds)
Published: Tue Oct 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.