Stable Diffusion Create Facial Expressions For AI Images And Videos

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone in this tutorial we are going to discuss how to use stable diffusion and comy UI to generate AI images and videos with more facial expressions for your characters as you can see this default text to image workflow creates a pretty standard image for female characters however the main issue with General text to image generated results is that they often lack facial expressions or have very minimal facial expressions in the AI image even though the eyes nose and other features look beautiful there are no facial expressions sometimes this cannot effectively convey emotions or feelings to the audience when you are doing storytelling or AI videos as you can test here I am using the LCM models to briefly observe how it looks with the text to image generation it appears like a passport or ID photo image without any dramatic Styles or facial expressions it's just like a stonefed photo let's try one more time this is another generation using the same text prompt as you can see this AI image is a slightly better close-up shot of the character the text has not changed but the face still lacks any Expressions we can try to add some text prompt here and see if there are any facial expression results let's say in this text prompt I will add a screaming to make the face more aggressive and the mouth a little open hopefully that will produce something look at this result it doesn't do much although I bracketed the screaming keyword this text prompt still doesn't significantly affect the AI image generated result even if I put 1.55 to amplify the screaming it does amplify the Expressions but sometimes it doesn't do exactly what we want in in terms of facial expressions when we see an image we want to generate something with very similar expressions for example screaming there are many kinds of screaming facial expressions but if you only rely on text prompts to generate a screaming facial expression it doesn't really achieve what you expect so I have come up with this workflow to experiment with facial expressions I'm using the imageo image method here I load an image in this example a stock footage image of a lady screaming with her hand up and I'm using the face ID adapter I'm using 0.8 for the Laura strength in the face ID adapter I'm not using the full 1.0 Laura strength here just leaving a little space at 0.2 for the AI to be more creative we got the image prepped for clip Visions passed it into the face ID adapter node and here the weights I put 1.5 weights in the face ID adapter node because I want to be more dramatic about the facial expressions we get from the reference input image so the face will be similar to the reference image and also the emotional Expressions then the next group is the general IP adapter I'm just using the IP adapter to imbue the styles of this reference image without the less exciting stock footage image here that only has a solid yellow background color hopefully we can make something more interesting and incorporate elements like the hands and other details above is a test lab that I'm doing I separate multiple control net models to run with the sampler after we have passed the adapter data into here the first control net group I'm testing is the soft Edge soft Edge is pretty nice very similar to line art that you can draw tracing the lines of the source image from the reference and we can almost capture the overall face and the character's body movement draw that line into the control net and let the control net replicate the similar pose or positions of these characters and put that into our AI image result in the clip text in the positive prompts I just wrote emotion face and screaming as we are doing of course the screaming reference image here where the character's face is screaming we have to include this text prompt to maintain consistency the second control net group is the DW pose why am I in including this one because I have seen numerous YouTube videos and other tutorials teaching people to use the DW pose or the open pose to generate facial expressions in the early days of stable diffusion and automatic 1111 I want to try this one here as well I want to see the result it doesn't have to be using the DW pose or open pose to achieve the desired facial expressions however by lining them up side by side for comparison we can observe different control net results with distinct AI image Generations the last group above is the line art essentially in the last group we are using line art retracing the lines of the reference input image character and then we can export that into the sampler and generate an AI image for the non-line art here we will always start with 0.5 that is my habit starting with the middle for line art if that is too much influence on our AI image we can dial it down or we can dial it up if necessary for the sampler all three control net comparisons are using the same settings parameters the LCM Model 12 sampling steps and a CFG of one we are using the same settings in the sampler right here we are doing a comparison for each control net to see which one performs best in capturing facial expressions and gener generating them in our AI image once again this is our image and we are going to use that for our face ID the generated images face of the character will also be similar to The Source reference image in this case which is this woman screaming image then we will have something different in the generated image for color and backgrounds it will use the styles from our IP adapter reference image again here we are using the LCM checkpoint models specifically the real dream turbo LCM checkpoint model we can run with a very low sampling step and as per your preference you can use other styles if you don't want to use LCM you can use the realistic Visions or any other non LCM checkpoint model to generate more detailed images let's check out the results here we have the soft Edge one first the first group is the soft edge control net we see lots of traced lines capturing the character's full face mouth and emotions as you can see the facial expressions in this AI image do capture the screaming and there are some wrinkles on the forehead of the character however there are many details lost in the change of the side of the face and some face muscles are missing there the second one is the control net for the DW pose which many YouTube videos teach people to use to capture face expressions yes it can be captured by the DW pose skeleton line and face recognition tracing however looking at the result it is quite different from our input reference images facial expressions I would say it is about 70% similar but it's not fully what we want in terms of facial expressions as you can see in the input image here there are many more Expressions wrinkles and even the eyes are half open and closed conveying an aggressive feeling but when using the DW pose there is a lot of loss on the cheeks and the side of the face even when the forehead doesn't have any wrinkles lastly we are using the line art this one is the most accurate I would say if we are using one control net alone to generate facial expressions using line art is the most accurate one to capture the expressions of the face the moment of that image and how that face is going to react this one has pretty much captured the whole thing and you can set the strength 2 one and test it it will be even more aggressive in depicting the facial expressions if you set the line art control net strength to 1.0 as you can see the line art outperforms the DW pose for facial expressions here and it got all the wrinkles the eyebrows on top and the forehead there's some muscle tension on the side of the face as well which it captured from the reference image the open angle of the mouth is almost exactly what I expect and the hands turned out pretty well too it didn't produce any deformed hands or extra fingers like we sometimes see in text to image workflows this is arguably the most recommended control net model we can use to capture facial expressions which is line art the second one I would recommend is the soft Edge both of them perform well in benefiting from using these two control Nets to capture facial expressions and of course lastly the open pose or DW pose did not perform ideally for me from my experience this is not the perfect choice for facial expressions but I don't know why back then many YouTube videos talked about using this DW pose for capturing face expressions and doing that for AI images however let's see if we can do even better by combining two control Nets together in one group let's set up another group on top where we can use one single sampler and connect two control Nets here using the soft Edge then connecting to the line art we are keeping the same settings in both control net custom nodes and we are going to connect the other rest of the data pass that to the group above and connect the positive and negative conditions for that group hopefully we will have something without any errors and we can test that one more time so we have lined up all the groups here for both control net and the sampler and we got this result now at the top here we have combined two control Nets the soft Edge and the line art control net together and we got this which I would say is the best performed facial expression captured image result let's put it side by side with the other control net testing groups that we have underneath in this image we see that we got so much more detail on the eyebrows and the temple side of the character's head there's more tension on the skin and the face muscles the expressions are even better than the line art generated result alone and obviously better than the DW pose where there is a lot of expression loss in the DW pose generated image but the soft Edge combined with the line art result does outperform that and lastly we see the soft Edge only generated image is quite good but there are fewer wrinkles showing the expressions of the face less tense on the cheek on the forehead showing the screaming and even the eyes are not half closed as you can see in the soft Edge only control net generated result this is not fully expressing the scream and the tension of the character in this image so far the result is that the AI image using the soft Edge combined with the line art control net to generate the result is the most efficient way to capture the source image's facial expressions of the the character and then transform it into another form of whatever Styles you want to create in my case I used realistic checkpoint Styles you can try other styles like anime 3D cartoon or something else of your design choice this is the AI image workflow using stable diffusion and comfy UI to generate AI characters with expressive faces let's put this group on top to be organized and in this one I will call it the control net soft Edge Plus line art group for this set of custom nodes used on the last generated result here I will set up this workflow like that so you can see the side by-side comparison with different control net performance for facial expressions and lastly the one on top is the one that I used combining two control Nets together to generate this most expected result that I wanted for facial expressions here so how do you do this facial expression capture or facial expression generation in AI videos now let's check out another workflow that I created here's the workflow for generating facial expressions in anime diff using a video source to capture the facial expressions that you want to make your character Smile cry scream or whatever Expressions you need mostly it's a close-up shot for this kind of Animation from videos and you need a close-up shot of those Source videos as well similar to the image generation workflow we did we are mostly using a very close-up or medium shot of a character like this one The Smiling girl and it generates something very similar to The Smiling girl's face maybe we can change the background the style the artistic style drawing styles of it we can use not only realistic Styles but also 3D Styles or anime styles to create AI videos using this kind of feature lastly we will have one sampling group here recently I like to use the efficient sampler to replace the sampler nodes to refine more details then we have the video combiner at the last one so let's set it to a frame rate of 30 for this output video and I will be using some stock video footage from art list and we will be trying some facial expressions videos using animate diff what I have here is a girl screaming and crying this is a good example for a demonstration and we can also try to animate this to a cartoon or anime style which would be more interesting let's load up this one we have loaded this video here the load videos group again downloaded this from my art list subscription membership and I got that loaded up in the load videos loader group in this workflow I set the width and height to a smaller size just for demo purposes and also the frame numbers I set it to 60 just maybe one or two seconds for demo purposes as well and in the model loader group I am using LCM models here mostly to save some time in the generation processing and here we can use some other styles for example we have animate checkpoint models and also this is the LCM checkpoint model so we can use that then scroll down here and I will be explaining the conditioning of this one to capture the facial expressions of this work workflow how it can capture the face like the image workflow that I just did but then there are multiple image frames for each image that we have to handle and process for every single image frame and then generate those list of images in processing so in the conditioning group as you can see here we have captured all the image frames processed that in the loaders group and then the conditioning group and then we have the clip layer we have the positive text prompt and negative textt prompt so for the positive text prompt I don't really do much description about what the character is and things like that it's just a very general Styles text prompt for example if we are doing an anime style we don't need photo realism and ultra realistic we just type animate in here that is very easy and I want to change a little bit the color and styles of the character's hair so I put brown hair hair and maybe the skin I can change that for example to black skin that will be something that we can change as the output for the AI video but it will remain mostly the same shape the form of the character's shape in our AI video so let's set up the background as well maybe for the anime style we can put neon light so make the background more interesting more colorful for the anime style and I set it to 1.4 to amplify that a little bit more neon light and froms here we have to change that a little bit as well we have to set up like screaming lips and crying some kind of keyword like that to amplify the emotions of the characters that are happening in the videos I have also changed that as well all of this is mostly the same because the actions the movements in these videos are just a single movement therefore we are using the same negative prompts here then come to this part the control net which is the soft Edge and the line art that I'm only using for capturing this close-up shot and generating the facial expressions of the source image we can then generate something in the AI image result the same settings for the control net that we have in the image generation workflows and lastly the anime diff group here is also doing the same thing that we usually did in my previous videos essentially all these nodes and settings are coming from my previous workflow for anime diff right now I'm using the V3 motions model I have fine-tuned the V3 motions model in other files which I will talk about in other videos and finally the sampling group this is for the sampler efficient nodes that are going to process the sampling steps I like this because I have the higher rest fix for latent upscales within one processing so I can save a lot of space here without a lot of nodes connect conting together a very clean very modular style of workflow design then let's click Q prompt and see what will happen it looks like everything is going fine two control net models are loaded and we have seen that data start flowing into the sampler it starts processing the animations in the sampler here let's wait for the result and see what will be shown you think your wisdom certify please drop and half think you're wish G clown of the town but you missed the bees cuz knowledge is a hve and you ain't got the keys ignorance on legs yeah I see your struck butting to we know what's up the cut okay so we got the result here it is you know because we are using animated Styles checkpoint models we won't have realism but this is quite cool we captured the screaming and crying motions of the character and it got it from this part let's see the original Source video it is like that and once we get the fully generated full length of this video it will become a longer video with longer seconds but it looks quite cool we got some anime Styles and the character is getting more emotional we see the tears coming out from the eyes and some wrinkles when it's crying screaming so there you go this is how we can do facial expressions in anime diff for AI videos and also how we use stable diffusion to create facial expressions in AI images and we got a very cool result here let me generate the full frame of this and we will see the result at the end of this video I hope you guys got some inspiration on how you can use control net in a more flexible way that is not only saying open pose is the main thing that you have to go for and we finish our generation for the full frames here and this is the result there you go we have pretty smooth facial expressions animation here so I will see you guys in the next videos have a nice day bye check it
Info
Channel: Future Thinker @Benji
Views: 3,503
Rating: undefined out of 5
Keywords: AI images, AI videos, facial expressions, stable diffusion, ComfyUI, text-to-image workflow, image-to-image method, Face ID adapter, ControlNet models, soft edge, DW pose, line art, AI-generated content, storytelling, AI enthusiasts, content creation, tutorial, digital art for begginers, stable diffusion tutorial, Stable Diffusion Create Facial Expressions
Id: B9yC-qnVDd8
Channel Id: undefined
Length: 20min 39sec (1239 seconds)
Published: Mon May 06 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.