Mastering Text Prompts and Embeddings in Your Image Creation Workflow | Studio Sessions

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
one of the things that I often hear when we start uh getting into the weeds of using the tool and as I've been talking to you know more and more creatives and professionals over the last few weeks is there is a we'll say a limited understanding of what a prompt actually is and what what we're asking for when we prompt the model um some examples that I have seen where people clearly don't understand how it all works uh people will prompt like they're talking to chat GPT basically I'm going to talk conversationally to you and hope you understand what I'm doing uh so remove the background uh or you know update X to y which there are technologies that do that and you can ask for that type of stuff for example in chat gbt but when you're using a tool like invoke we are passing the prompt directly to the model we are taking the words that you pass in hitting the model with the effectively like that raw text string and going through a process called diffusion in order to generate the resulting image now over time we're going to see models get better and better uh so this is kind of a a technical or like jargony type term called prompt adherence that is referred to in using these tools and prompt adherence basically just means if I tell you the cat is on the fire hydrant the fire hydrants blue the cat is uh a tabby cat whatever everything is where it should be and it is kind of like aligned with the prompt that I put in the current state-of-the-art open uh models sdxl do okay at adhering to prompts uh but they're not as good as kind of what's coming around the corner but all of that is still going to have this underlying uh Assumption of like what are we actually getting when we prompt the model So today we're going to focus on prompt design prompt structure um I'll be looking for some like feedback from folks who are in the audience on what people are struggling with and where we can help uh but we'll talk about not just prompts but also embeddings and embeddings are probably one of the more underutilized Tools in a creative uh creatives toolkit and kind of the the current day and age we'll explain a little bit about what an embedding is and then we'll kind of play around with designing a prompt uh so the first thing that I'm going to do is I'm going to uh just put together a basic prompt now uh we do have a GPT uh if you all our chat GPT users we've got one that we created and published called tag Weaver this is kind of my like shortcut for uh generating a quick idea of a couple of interesting words and so in this case I will say uh create a prompt or let's let's have the audience give us some Feb back here what should we ask tag Weaver for a prompt of uh this can be like thematic this can be specific subject um we can focus on you know concept art or pretty much anything uh we'll see what we get from TAG we see some typing happening so I'll wait for that to happen and we'll see what we get props okay create a prompt or let's what type of prop um an item we'll do like prompt for [Music] a potion magical potion uh tag Weaver has this kind of like structure where it'll go in and give a creative aim and then it'll give you kind of your tag prompt um and it gives sometimes you know a very long prompt that we kind of like pick and choose what we want to I see I see what's happening here people are asking uh do you want to share your screen the answer is yes uh we haven't really started doing anything in invoke yet but we'll share the screen because now is a good time to do it um basically I've loaded up tag Weaver in chat GPT you can use really any GPT model um you can uh run a lot of these locally as well and so there's a lot of tools out there for getting ideas for prompts and kind of like playing around with language and understanding what what we can add in this case I just you know asked for a a magical potion and I've got this uh long prompt here all which I'm not going to use all of it basically we we'll kind of walk through what what it all does um and maybe kind of explore uh that a little bit but I'm going to take it I'm going to paste it in I I'll do my generic uh uh my generic negative prompt uh and so just again kind of getting some context for positive and negative prompts positive prompts are biasing the image towards the words that you use in the positive prompt if you use prompt syntax like the plus or minus sign on certain words you're essentially creating more uh bias towards that concept you can almost think of this like uh we're taking a bunch of ingredients those ingredients are the words and we're putting it in a sto and the AI is figuring out how to like use all of that to make an image negative prompts are in in technical terminology called uncond conditioning and what that means is we are trying to bias away from those Concepts the these are it it is not so perfect as to say like we won't see that at all um because sometimes people will find that like a negative prompt is not uh perfectly removing some concept but the way to think about is we're we're pulling it away from those Concepts we're trying to kind of like steer it away from anything that's in the negative prompt so I often use just like generic categories of bad quality images um and it it's useful to think about like the terminology that's passed into the training data set so in this case I I think a pretty basic one we'll look at the full prompt here um I like this basic idea Enchanted elixir in a crystal bile so we like that nestled within an ancient spell book we're going to take that out alchemical symbols glowing with an ethereal light intricate glass work with swirling iridescent liquid I'm just going to leave it at that we'll leave it at that for now right and for the purposes of testing we're going to use a fixed seed um and I'll just I'll manually type this in uh 1 2 3 4 5 6 7 the magic seed um and what that means is we're going to be able to test every change that we make see what it does to this image and we'll be able to play around around with it a little bit um one thing that I always like to include um and maybe I'll just show you what this generates without doing that um and maybe I'll even take away the negative so we can see what the negative prompt does um I often like to see how this either matches or does not match my expectations so right now we're generating with a uh generating without a negative prompt and then we'll add in and generate with a negative prompt uh just to kind of compare the two Okay so we've got this cool looking thing I'm going to turn off the progress image so that I can take a look at this without um being distracted by the generation that's coming after it it's going through the vae all right so we've got this kind of like almost photorealistic looking glass bottle it is not really an Elixir it's definitely like been crystallized by the alchemical symbol it's like you know uh it's gotten a little too crystallized um take a look at our so when I use the negative prompt versus the full prompt you can see a little bit of I mean it certainly impacts the image I would say like and it probably improves the quality of some of this piece it adds a little bit of depth it's not I would say too critical in this case um one of the main reasons why is we're we're not really pulling it towards any specific style and so in this case I want to do a bold ink watercolor concept art um yeah we'll we'll we'll kind of add that in and so now we're kind of like iteratively seeing what each of these prompt terms adds into our image and we'll generate that and it should kind of get pushed a lot more towards that style turn the prompt preview on yeah so now we've actually got an anchor to to move towards since we're moving away from sketch and photo you know and I I don't want to get like too mathematical here but a lot of what is happening under the hood here is just math it's very like mathematical and so if you think about like a map you we're going on a journey we're on an adventure it is useful to say where we're going and where we're not trying to go right so if we're trying to go through to the you know XYZ Kingdom we don't want to go through the Forbidden Forest right so like we're kind of charting our course using the prompt in this case if we don't have a style we don't have like a medium it's not it's not a photograph it's not a sketch we're kind of just relying on it guessing correctly and it will it will guess uh it will kind of just like find something to generate even if we've told it where not to go it doesn't know where it's going so we kind of want both of those things we want kind of an anchor for the the style the the medium that we're generating in and we want to say what we're not trying to do and that's really the purpose of the positive and negative conditioning in that context so and now we've got this kind of like much more like waterc colory looking thing maybe we like this maybe we don't um I'll put painterly concept art B and painter concept art and see what we get and maybe that's going to be a little bit better of a style uh for what we're going for we did get more of the Elixir here okay yeah I think I think we're seeing what we got going we're going for the right style or at least my preferred style um obviously we could do a lot of photo realism if we wanted still a little bit photo realistic so we might up uh painterly concept art um brush Strokes um we might even do like digital oil painting and again this is all a fixed seed so we are iterating and kind of toying around with um you know a lot of these these words to kind of push it in that right direction I think we're getting closer and closer to where we want to be um in this case I find that this alchemical symbol is definitely um kind of changing this Elixir into something I don't want it to be so we're going to take that out regenerate and that should know should push it in the right direction we might need to say you know filled with a liquid um filled with a glowing liquid theal light although this is cool it's nice uh we just kind of want to keep pushing this in the right direction okay so we're kind of like iteratively focusing on getting that once we have that prompt and style ironed out we can kind of reuse it um in this case we're kind of fighting with this one element in the middle although it is cool uh we might try increasing some of these uh Concepts so oh yeah we do have liquid there so called that out so we'll do increase Elixir and we'll do uh increase on the liquid and again kind of prompt up waiting and down weighting is plus plus on the end minus minus on the end or any any number of pluses or minuses you don't want to go too too too hard um and you can also put them in parentheses and kind of use that uh as a as another way of controlling it somebody is asking about uh the shape and why it's the same is it doing this because the seed is fixed and the answer is yes partially um there's like a there's like a a dynamic here where if you think about the the process that's that's happening we're taking an image we're taking the noise and we are progressively iterating over that noise towards this kind of final output and every step of that process is the machine learning model saying okay here's the picture of noise here's the prompt I have or the conditioning I have how do I make this one step closer to matching The Prompt or matching the conditioning if the noise has a like structure in it which is obviously happening here this like shape in the middle is definitely in the noise shuffling the seed can help so we'll Shuffle the seed just to to kind of like explore same prompt different seed and we'll see what we get now it could be that this solves the problem entirely but I think something in our prompt is very much kind of like pushing it towards this like crystallized structure inside of this which looks cool but isn't really what we're going for right so we'll do Chandelure potion in a crystal vial and maybe we'll even uh take away the word crystal CU Crystal might be like crystallizing this uh so it wasn't just the seed obviously because we're still getting this kind of like crystallized structure in it um and that kind of tells us it's it's something in the prompt the swirling piece might be doing it as well so we're still seeing this kind of like swirling element um take that out anything that's related to like the glass work or giving it kind of like a hard structure um because what we're really looking forward is kind of like just the the potion it is cool I know people are saying it looks cool it does look cool but again this is like going back to how do we get what we are really looking for out of this um and obviously we could take this into the canvas and draw kind of like the hard lines um it's not really what we're going for though uh so let's see what else do we have glowing liquid if your lighter uh let's just try uh Enchanted potion we won't even do the vile uh glowing liquid theal light oh somebody said maybe it's the ink could be we'll try it without the Bold ink and that is I mean again this is this goes back to like you're biasing it towards certain words right and so you're kind of like pulling this out this one of the reasons why using control Nets is like so powerful is because you don't have to fight structure right you can just say like look this is this is what I want right um and we could certainly kind of take one of these out and pass that in as a control net but that's not the purpose of today's uh video so we won't do that um definitely getting this kind of like splashy thing and so we might take out liquid cu the liquid might be the splashing piece and I know that we've got liquid over here so we'll just do Enchanted potion blowing ethereal light iridescent and going to try taking clearing the vi here and trying it without any of the liquids at all it's again it's part of the part of the fun of crafting a prompt it's figuring out how to get that exactly the way you want it all right let's try this just because I'm I'm taking this hard challenge here and I really I really want to do this uh we'll do Enchanted potion see what we've got yeah some somebody's calling out like this this could be just part of the the training data like it might need to see more potions yeah just looking at what we're getting with just Enchanted potion there's this element here that is it's just like sitting inside of it it could be even Enchanted we do potion uh we'll say Uzi green potion see what we get this one's still the enchanted piece I think but we're starting to move move towards are like Focus area and we're seeing what words actually pushed it to have that structure that we didn't want um the oo is out of the bottle in that one which is funny um so I think this one was actually pretty good it's just Potion No Enchanted so if we look back through and we kind of like look at the words that drove that we took out a lot of the structure and eventually just got to kind of like our potion and I think this again goes back to if you have intent with the word recognize that sometimes the system is not going to understand your intent and this is you know this is a lot of a lot of ways to say this is why training a model so that your language actually matches what you're intending is so powerful because you know I I was talking to someone on the team earlier today all of what is happening in this space is creating an average interpretation of a word right it doesn't matter how much data you have it is still getting kind of like the average across all of those instances and that average may be different than your intent very often when we have creative intent like Uzi in my mind has a very specific like definition this is what I mean by Uzi and so we'll talk about kind of as the the second part of this today embeddings so I think this is like a good demonstration of why embeddings are useful when I train an embedding what I am doing is I am really kind of codifying a word to mean something uh the the technique is known commonly referred to as a tech textual inversion in the image generation space and basically the process for training a textual inversion embedding is I'm going to pass 30 20 images four Images I'm going to pass those images in and I'm going to kind of create a code word for that concept I'm basically trying to teach it when I tell you X generate this and so I've got uh an example here um that I've trained I'll use it just to kind of demonstrate uh ways that you can use this I've uh trained an embedding called Pro Photo and I've used this in the past I think in a an example or two to generate like a photograph so I'm going to do a green potion Pro Photo 2 I'm not doing any of like the prompt terminology to describe a photo or anything like that I'm I've already trained kind of the concept of what a Pro Photo should look like and I'm going to generate for that and we'll kind of talk about how I can use embeddings both in the positive prompt and the negative prompt and we'll talk about a technique called pivotal tuning which is super cool as well uh which uses both embeddings and LS okay so I've got a green potion proo 2 this is my Pro photo of a green potion it works right uh let's try our previous prompt here one of these maybe this one that out actually maybe our original prompt uh these back here give this a shot I know it's going to it's not going to be like our potion or whatever but we can still um see what it looks like when we do a Pro Photo to uh and I will also call out that I'm on the pre-release version of 4.0 uh so that'll be coming out soon uh and you can try that out you'll see a couple of things that are a little bit different in the UI around embeddings and the ability to customize models with trigger phrases which I'll show in a little bit so we've got kind of our Pro Photo of this potion that we had before so this is uh when we had kind of like the painterly style we got that when we used Pro Photo style we got this and that's kind of like a cool interesting thing now now I can also use it as a negative prompt uh and that will push it away from the concept of Pro Photo 2 right and so you there are concepts of negative embeddings as well and you can create negative embeddings of anything and a lot of what you'll find that kind of like help increase quality is um again pushing away from lower quality images that you don't want to see and pushing more towards things that you do want to see so you can create positive and negative embeddings in this case I'm going to use Pro Photo 2 as a negative and maybe what I'll do is I'll go back to one of the images that we generated before I'll maybe take this one because it's particularly like uh painterly I'm going to use all so this is everything that I used to generated if I hit the generate button it would generate this exact same image but instead of uh just photo I'm going to use proo 2 as a negative embedding and I'm going to upweight that so the only thing I've done is added my Pro Photo embedding as a negative and increasing the weight of that and it won't it's not going to dramatically change the image it's not going to be like a completely different image but it will push it pretty hard away from that concept of a photo and maybe help increase the quality uh so in this case you know we can kind of see the difference here uh it's a little bit simpler and we've kind of like pushed it pushed it more towards this complex kind of interesting thing inside of the uh image but a lot of the core elements are the same it's really just helps refine things a little bit now this is where you can also use and I trained this as a pivotal tuned embedding and you guys are going to have to keep me honest here if I'm starting to get a little too technical bring me back down to earth I want this to be interactive and educational I don't want to like go away Way bit above your head when we train models we have kind of like multiple options of training we can train the actual underlying content so it it learns something new and that's what Aura is or what concepts are is we're training new Concepts into the core understanding of the model this is useful when we have something it has never seen anything like it before we need to give it you know a new uh understanding we need to teach it something completely new that's where allur is really helpful is because you can kind of inject that into the AI model and embedding is useful for referencing stuff that it's probably has seen it probably knows what it is it just may not have a really really concise way of having us articulate that to the model so in this case it's seen you know photographs it's seen professional photography uh so it knows what that is I've got kind of an embedding so it's useful it can it can kind of say okay that's what you're looking to do um something different for what pivotal tuning does is we are training Aura on content and we're at the same time training and embedding to reference that new content so we don't have to use existing words we actually have this like very very tightly coupled I've created a bunch of new content and I'm giving you a very articulated way of referencing that now in this case you know profoto works on its own because it's just a generic kind of um concept but I also trained a profoto Laura uh and so you know took a bunch of uh commercial Commons images of professional photography trained that both as a Laura and as a textual inversion the invoke training script has a pivotal tuning mode and basically does this for you if you're interested in doing it uh but now I have a Laura and a an embedding that matches to that Laura so now we can actually see what happens when I use the Laura as well on top of this so we're going to take this same image that we generated and again this is without any embedding this is with the embedding in the negative and now I'm going to add the Laura that kind of like amps up its understanding of what profoto 2 is now I'll also probably push this up into the positive prompt just so that we can see what that looks like as well um but you start to see that like there's a lot of articulation here in how we're creating this and once we have a really good style and a way of creating something that's like powerful um we can reuse that right we can save that into a workflow and have that baked into a process so we've got this is our profoto Laura amped up uh this is just with the embedding and then this is without the ne at all so we're kind of increasingly pushing it more towards a certain style uh as we move away from this and into this now let's take profoto 2 and push this up here so we're taking out all the style stuff um for the sake of just like cleanliness we'll take out the negative prompt and we'll see what happens when we get an enchanted potion in Pro Photo style um so now we've added that to the positive prompts so we can really compare like mathematically when we move this one word around or this one like embedding around so that it understands it in different context we can see like where is it the same where is the subject matter the same and what's different by moving this around so now we've got this same thing that we were generating before a lot of the same Concepts so the the vial of the potion uh we've got this kind of like green thing coming in from Enchanted we found that Enchanted was kind of doing this like you know glowing object and pro probably the reason for that is when you look at like an enchanted forest or an enchanted anything there's like these like clowing things around it so that's probably why Enchanted does that we take away uh Enchanted and just do potion Pro Photo 2 we'll probably get rid of this object in the middle we'll see what we end up landing on um overall so it becomes a lot skinnier we actually still have this subject this like thing in the middle which is interesting um but we kind of are able to see what the different words change in that process so questions from people does that all track does it all make sense um I think this is like most useful when we're trying to see like does that understanding or is that definition of how we're describing it makes sense to people who are creating with this stuff so I'll I'll kind of open the floor for confirmation that it all makes sense or questions on whether that is um a little bit unclear still um and then we'll talk a little bit about some of the upcoming features uh that make this all really usable and kind of reusable in a lot of ways um someone's commenting their understanding the embedding is like a subtle change to the generation uh not necessarily um an embedding is basically the same thing as any other token or phrase it is augmenting the model or the the the generation and pulling out or invoking something different from the model it's basically if you think about what's happening under the hood here I described it as a math earlier right anytime we put a prompt in that prompt is getting turned into a math language okay and that is conditioning or uh the tensors are are called the conditioning tensors right so that math is not something we understand we don't understand that math but the the machine Learning System does that math can be composed or created through words and so that has effectively been how all of the words that we use in a prompts are being translated it's being translated to math or using the embedding process we are training the system such that when we use a specific phrase like this Pro photo2 and the the carrots it generates a very specific mathematical output and that has basically been honed and trained on what we want that to mean we're we're tuning that embedding to mean something we want it to mean and that is what an embedding is it's just like any other prompt that we use it's getting turned into math but we have created our own work that's that's kind of the way to think about what an embedding is um someone asked about the concept of trigger phrases um and maybe this is a cool moment to Showcase some of what 4D AO has as far as updates that are coming down the pipeline um so we'll take May the prompt from this uh bold Inc painterly concept art brush Strokes digital oil painting and what I mean by trigger phrases are in 4.0 we have a new model manager we kind of talked about this and shared a lot of kind of upcoming stuff um this model manager is for the local Community Edition uh we have kind of more features for the Enterprise if you're sharing models with teams and kind of doing all this stuff and managing this at scale scale there's another level of Model Management that sits above this in the Enterprise version but this is the commercial uh this is the Community Edition excuse me and the Community Edition has uh new features called uh default settings and Trigger phrases default settings let you set defaults for each model that you're using so you can have uh vae always loadable scheduler steps basically all the advanced stuff that you typically need to set on a per model basis you can set that ahead of time and then just hit one button have all of those set anytime you switch models but we can also tie certain trigger phrases to a model so if I've trained Aur to use a specific phrase or if I've trained a model to use a specific phrase I can put that in uh as a trigger phrase and so then I don't have to remember you know what what was this model trained on how did this model uh get trained that is all handled here we can also use it to save prompts that we like to use with that model and so in this case I'm just going to show an example of I can save this bold Inc painterly concept art as a trigger phrase and then I can reuse that really easily from that embeddings menu inside of the app so let's say I'm starting from scratch right I'm like oh you know I like I I've saved that style That's My go-to Style with this model you know now I can say I want uh what's another prop prop um let's say a some somebody's typing so I'll let them give me the prop that we're going to going to use uh a a sniper rifle okay U this will be fun sniper rifle oh wait I don't want to copy paste that I'm going to show the trigger phrase so I use my embeddings hotkey which is that left carrot and now inside of this list I can see my trigger phrases as well so I've I've basically added a shortcut for certain elements of my prompt prompt fragments that I can use and these can to be long or short right and now I can kind of reuse these so I'm just going to reuse that right uh and now we've got our um I'm using my Pro Photo Laura which I don't need for this uh and we'll generate our sniper rifle Bolding paintly concept art now sniper rifles and in any kind of like you know video game uh props or content what you'll typically find although this is doing a pretty good job what you'll typically find is that most of the time if you're a studio that's working with any type of equipment uh drones sniper rifles helicopters y y y you're going to want to train Aur on each model like each specific uh type of weapon so like let's say Halo for example you've got like plasma swords you don't want to just prompt for plasma swords you want that plasma sword and I'm I'm probably butchering it I'm probably it's probably not plasma sort I haven't played Halo in a long time but you know you've got that that kind of like big very specifically shaped everyone knows what it's supposed to look like that is what you want you train a Laura for on that a pivotally tuned Laura maybe so that you've got the embedding for it and now you can really prompt for that one thing that you want to use and this is where the technology gets really really interesting because as a creative you're very often looking for that type of control I don't just want any sniper rifle I want this model sniper rifle I want this right and while this is okay I think anyone who's familiar with like the process of making stuff for games they would look at this and say there's a lot that's wrong with it right for at least for what I want like maybe this strange Barrel thing isn't right or like this the scope that's on the sniper rifle is way too big and we would want it to look a little bit different right so there's like a lot of things that when you train a Laura or an embedding it can really tightly understand that concept and help you improve that I I think the purpose here was just to show that now we've got like a really reusable style that transfers really well um and I think that worked uh but obviously you know we can play around with and say you know like a mid-century modern chair uh we're doing all kinds of PR props prompts for props maybe that's like the uh the catchy clickbait title for this one that's the title prompts for props uh now this is where I think things get really interesting we have a mid-century modern chair and we have this style why does this look so realistic right this is a lot more realistic than this right and I think this is where again it's all math midcentury modern chairs are very heavily skewed or biased towards photos of furniture right mid-century modern chairs look like this they look like photos right so while we do have the concepts translating here of like you know artwork and kind of bold ink and stuff like that um this is very heavily skewed towards kind of photograph and so we've got that photograph piece in it now we can do couple things um you know we can increase the weight of our negatives we can add in things like our proo 2 we can add our uh profoto Laura and so we're kind of boosting the understanding of professional photo and negatively waiting for it and we can go in and group our prompt together and upweight for digital oil painting right uh all of those things we've got a fix seed should push it further and further away from that like core understanding of mid-century modern chairs are photographs and push that more towards like we're really looking for a painterly style of this um Still Still maybe this is like The Struggle Bus of of prompt adherance right closer but I still think it's a little too realistic for my taste I mean you you can kind of look at it and see that there's like some brush strokes but it's not really the same style that we're looking for uh or thinking about so we got some time we'll we'll we'll hack on this because I think this is like the interesting piece of of why why why do things turn out the way they're turning out uh why are they looking uh a little bit less like what I prompted for and more like X Y or Z it all comes back to understanding how to craft this in a reusable way that's generalizable right getting your prompt in place to um kind of expand that capability set somebody asked the question uh are there any plans to add internal or external tools to train TI lauras and invoke there already are um we have open-source scripts for training and a training UI in the invoke Das training repo of GitHub uh which which somebody has just linked in the chat and which we'll try to reference in the description for the video um it is kind of early beta uh of that and so definitely taking feedback on both the UI as well as if you run into anything there's a lot of code in there uh for training stuff we do everything from textual inversions to lauras to um pivotal tune floras there's also some kind of like uh future work that we've got planned for DPO Laura training uh but all of that is in the inoke training repo uh that one's going to be a little bit more technical to pull down uh and you know is a little bit more of like a barebones beta UI but you can it's definitely usable we'll be covering um some of the training materials and content in a video in the next week or so um we've been wrapping up our 4.0 release and then um you know Ryan who is our uh machine learning engineer will kind of sit on a call with me and we'll kind of go through the training repo and kind of explain how to use it uh what it's for and all these pieces this is a good kind of like precursor to that video because now we understand a little bit more about like what is uh embedding what is an embedding what is a Laur and you know a lot of the the fun stuff there um but let's get back to this um because you know know while I I'm sure everyone loves hearing me talk it's a lot cooler to see pictures get generated um so we've got a mid-century modern chair if I take mid-century modern out I probably will get more of the style so again this goes back to like what is pushing it what is biasing it towards this notion of like realistic uh and I have to imagine it's a mid-century mod because if we just do a chair this is another interesting this is another interesting one what types of chairs often are painted right like what what what do we have more examples of when it comes to chairs that are painted typically older chairs right especially if we're talking about like oil paintings right and this this all goes back to the the kind of like mathematical undertones of all of this which is we are pushing things more or less towards things that exist in the world things that are out there in culture and so you know if we want to show it more examples of mid-century modern chairs that have been painted and entrained that into Aura now we actually have a better understanding of what a mid-century modern chair might look like in uh Aur uh and somebody uh is calling out um one of our uh folks that works in architecture uh which you know if you're in architecture mid-century modern is like I think you you definitely understand mid-century modern because there's a lot of mid-century uh modern architecture that's really really like profoundly beautiful I think um but mid-century modern furniture this is what he says was in a time period when we became fascinated with the camera and photography was more accessible right so like again this all comes back to how are these Concepts linked together and how do we articulate what we're looking for so in this case what we might do is we might take this to image to image right so we've got the shape of the mid-century modern chair and now we just want to really push it more towards this style um and you know we could throw in a control net if we wanted to but we'll see what we get with just a an image image um chair Bolding paintly concept art so now we we've got the structure of the mid-century modern chair but now we're pushing it kind of more towards um kind of the art style that we want what we may find and you know it's pushing it a little bit we might increase our D noising strength and if it if the structure gets too lost then we'll have to use a control nut um I think we're keeping the structure mostly um but even in the even in the the shape and structure it's still biased more towards this kind of like concept of still looks a little photography to me and maybe that that will always be the case I mean this plant is starting to look a little painted so I mean this is definitely looking more painted than photographic I guess I maybe take that back uh when you scale it up but there's some elements here that are um you know uh the shadows and the corners of the room look a bit a little bit too real a little bit too um well structured for it for me to like lose that perception of this is a painting but I can look at this and see that there's a lot of the the kind of telltale signs about painting with like brush strokes and stuff like that so that might that might work um we might try this a different way as well so I'll do a a canny of this and models and we'll just do a general like no image to image we'll just use the cany map and we'll kind of see how all of these things work together I think this is this is a really like really powerful way of exploring how all of this stuff is connected and how it's like uh you're you're kind of teasing apart this to get uh the the things that you want someone asked oh let's see we got a couple comments here uh the connection between photography and mid-century modern exposes one of the subtle biases in our visual culture I think that's like super fascinating I agree this is like one um very fascinating thing about machine learning is is that it is you're effectively speaking to culture in some ways especially with these like uh General trained models um and the openly licensed stuff you're you're kind of speaking to the collective culture it it exposes a lot of biases right and I think you've seen this in the news recently with Gemini um they they actively try to fight or tried to fight against those biases and ended up showing like it's really hard to to un or disconnect the biases from the under Ling um history and culture of of everything um someone else asked about painting a trained or training a painting Laura would that come in to help yes a painting style Laur will fundamentally like if you think about what Aura does it is taking the base model and it's really injecting like the more recent training into that model so if you train a on a bunch of painting uh or a B bunch of painting style uh outputs the most recent training that a model undergos is very often the most impactful it has the biggest impact on the general understanding if you went through and took a general model and then trained it only on paintings but you didn't use the term painting you just trained it on like this is a cat this is a dog this is a chair but all of them happen to be paintings its World model like the way it understands reality would be entirely modified to be painting like I live in a painting world right and and I think this is kind of like one of those interesting pieces and maybe getting too philosophical here uh but I find it super fascinating I love to talk about this stuff when you have a world model or when you when you think about a machine learning model it is like a child and it is only exposed to what you train it on right and so if you only TR it on paintings it thinks it lives in a world of painting right it's like this is my world this is the world that I live in I've only ever seen paintings and that's all I can understand and so that's kind of like again how we when we think about using these models the most powerful the most probably ethical way of using these General open models is by using lauras trained on your work your openly uh openly licensed stuff Creative Commons things things that like are generally usable because that's going to kind of inject a lot of the most recent training into the course style uh and then allow you to kind of have the broader understanding of the world right the what is a chair what is a uh you know animals what are dragons things like that but you have the more more kind of like style oriented stuff overlaid on top of it with Aur uh somebody asked about user interfaces for the web um definitely think like a more uh uiux focused Laura would be better there so optimizing the image generation model to to generate specifically images of um yeah images of of uis uh there are some that have been trained and it's basically just people go on the internet and like uh take pictures of of websites and say this is uiux um but if you were a uiux designer and you had you know beh portfolio you could train it on your uiux style and you could give it the language of this is my button component this is my XYZ component and it would start to understand how things are laid out and composed you almost even use like Atomic design for the uiu piece for that um that is that is um a really cool way of using this stuff um I do think that it's probably worth calling out uh as well that um there are like models that have been trained on completely different problem spaces like this is this is one of the powers of the image generation space and why invoke is like a really powerful tool because it's very generalizable um back in I think it was 2023 early 2023 maybe might have been late 2022 there was a team called refusion and what they did was they took the stable diffusion model and they trained it on pictures of sound pictures of music and with enough of those pictures generated basically is like the sound waves right uh they were able to generate pictures of the sound and then play it back and it would make sound it would make music right so it's like these models can generate any visual media anything that you want to tune it to to generate that thing it all comes back down to training it into the model training to understand what you want to use and that is kind of the power of this piece so I'll get back to generating some pictures here because I know I've been talking for a little bit we got a couple minutes left uh somebody has said that they have a couple of ideas for that they want to try um which is using negative photography prompts so uh say I'll copy out some of the the things that they used um let's do midcentury modern chair and we're going to add in depth of field soft light exposure um even do something like uh taken on Aon camera you know just stuff that'll push it away from photography and he's saying don't blame him if it doesn't work no I think it's a good idea it's a good idea um so we've got our fix seed all that stuff um maybe just to do a direct comparison back to our original chair I think this this was our original chair we'll use this prompt and then I'll add this negative one in and we'll generate that um people people being super kind in the chat love that um yeah I think I think we're still going to see this um pushes it much more towards the like photograph style even even with those negative prompts now what we could do is um thinking thinking about some ways we could play around with this so we got mid-century modern pushing it more towards uh photography um we can try down weting the Century Modern so I'm going to downweight it by two and this this is maybe a good maybe a good explanation of down weighting down weighting is kind of I I like to think about it like when you're playing uh pingpong or tennis or pickle ball you know you see these like pros they kind of do this um they hit the ball with just a little bit of curve and it kind of like spins around get this like technique that's my mental model of down weting you are still incorporating The Prompt into your prompt it's still a positive prompt but you're kind of downweighting it you're kind of giving it like a a a lower curve if you will so you're like influencing it but you're like not but not too much right but not too much uh the negative prompt is obviously like remove this but if it's in the positive prompt you're adding it in and so we'll try it with like negative just to see what we get and you know again it's it's it kind of goes back to yes someone said one teaspoon instead of one tablespoon is another another uh ingredient reference um it's it's not going to fix everything because it's still as you can see is still mid-century modern um but it is a lot more painterly like you see much more of the painterly type of look in this photo than in the uh photographed version right and so again this kind of like it's a good way to finesse things the way you want it to be uh the other thing that we can do uh CFG scale you know we we had a video I think on YouTube of prompting tools right and the CFG scale is um CFG scale is kind of like how strictly it it adheres to our prompt right now it is uh at a five which is like I would say not on the super low end but it's definitely lower than we can go we could bump this up to an 8.5 uh and I'll even you know we'll take we'll leave the minus minus there and we'll see what we get so I'm increasing the strictness of my prompt you must adhere to my positive prompt um and we're going to be able to compare this with um the before and after of increasing this again the these are there's a lot of like tools here see it's like very strict it's very that's a painting that a that ain't a photo um and we can kind of like tune it that way as well so we've got all of these ways of tailoring The Prompt and controlling the strength to get to the place we want it to get to and this is why it's super helpful when you have found the right mix and somebody said this looks a little overcooked I agree that's why I typically go a little bit lower so we can do seven for example and see what that looks like once you found the right mix you can bake this into a workflow for a specific job to be done for you and then you don't have to worry about it right like you don't have to think about um like fiddling with all these knobs and levers you've got a much more controlled uh tool so the lower we go on CFG scale the the more you know liberal it's going to take our prompt if we go down to 3.5 um it's kind of like in a wander off on its own a little bit starting to get we're starting to lose that shape a little bit with 3.5 it's a chair um yeah somebody somebody called out that like oil painting has some bleeding into the wall in some of these and again I think that's like um there's a lot of comments in chat that are funny uh yes so we are actively we may release this with 4.0 as a beta backend feature and workflows this might come out in 4.1 we are working on Regional prompting and what that'll allow for is very targeted this is where I want this in my image and you'll be able to use that through an interface and have very specific control uh Regional prompting is very very exciting uh for compositional control I know a lot of people are now like saying it's yeah we're excited yeah that's it's going to be super cool um let's see yeah I think overall people have got a lot of good um thoughts and feedback on like the the prompting stuff um and we're kind of nearing the end of the hour so we'll probably call it for for today but hopefully this was kind of like an educational exploration on how prompt terms influence kind of the math behind the scenes on all this stuff um some of the techniques you can use including training your own model uh including kind of like going in and using some of the more advanced syntax and the advanced controls um this is intended to be an educational session in a lot of ways so hopefully it was um I certainly feel like there were a lot of good examples of the pros and the cons uh prompts for props is again the title that we've decided to go with for the video which will be great um appreciate everyone joining we'll see you guys next week for another one it's been fun bye
Info
Channel: Invoke
Views: 1,761
Rating: undefined out of 5
Keywords:
Id: ZDYM8ftVGlM
Channel Id: undefined
Length: 59min 5sec (3545 seconds)
Published: Fri Mar 15 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.