Your personal Avatar from a single photo (2D/3D/Images/Animations/LipSync)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi welcome to my new tutorial in this video I will show you how to create your own personal Avatar out of a single photo we will then turn our Avatar into a 3D model and make it walk just like a real person and of course we will make our Avatar talk as you can see here so let's get started for this tutorial I went deep down the rabbit hole to give you a comprehensive and practical guide on the whole process of creating an avatar of yourself or any other person you like even a totally fictional one please act responsibly when creating an avatar of a celebrity or a public figure this guide is divided into chapters so you can skip any steps you're not interested in in the first chapter I will show you how to build a comfy UI workflow that will create high quality images of your avatar that are stable and consistent using only a single photo as an input I'm going to use several I P adapters and an sdxl lightning model which takes only four steps to render by the way you can download all my workflows for free link down below the second chapter is about creating an animated 3D model out of your avatar images using the free aurn website in order to do that we need to create three Avatar images from different angles which will be shown in my second workflow then we will import the 3D avatar into blender and create a little scene that will be used as an input for comfy UI in the next chapter we will extend our comfy UI workflow and use animate diff to create a flicker-free walking animation then we will use face Fusion to improve the quality and coherence of our animation even further in the next chapter I will show you the dream talk AI script to let our Avatar talk and use face Fusion again to improve the quality and in the last chapter I will show you how to use my little python script to download all models and checkpoint files used in this tutorial automatically you see there's a lot to show so let's get right into it starting with the standard comfy UI workflow let's first select the sdxl lightning checkpoint in my case it's the Juggernaut sdxl lightning four steps model we will use a fixed seed to get reproducible results any seed will do we reduce the steps to four and the SE FG scale to one now that's important the sampler needs to be Oiler and the scheduler must be set to sgm uniform or it won't work with only four render steps the sdxl lightning model is very fast so we can set a high image resolution let's go with 1344 by 768 pixels which is one of the standard sdxl resolutions now we need a load image node for the photo which we are going to turn into our new Avatar the photo should be a close-up facial shot with the whole face and some of the background visible it's best to use a square image but we will crop the image to the required sizes for the IP adapters in The Next Step next let's get an apply IP adapter face ID node for our first IP adapter it's quite a powerful node and it will bring us one step closer to our Avatar let's connect the model and then add a load IP adapter model node we need to select the face ID plus V2 sdxl model to make it work with our sdxl checkpoint let's connect those two and set face ID V2 to true we leave the rest of the settings at their default values now we need a clip Vision loader with the b97 K clip model and a load inside face node set to CPU GPU doesn't seem to work yet finally we need a prepare image to inside face node to crop our input image to the right size let's connect it with our image we can also add a preview image node to see what's happening looks good now we need an apply IP adapter node connect it to clip Vision let's also duplicate the load IP adapter model node by pressing the ALT key and dragging it down now select the full face sd15 model it's a 1.5 model but it works just fine with sdxl let's connect it we also need an image crop node to crop our input image to the right size for the IP adapter the face should be centered and fill the whole image without showing any background and it should also be square let's tweak the settings until we get the best result then connect the cropped image to the IP adapter adapter and connect the face ID model output to the IP adapter model let's reduce the weight to 0.6 and set the started value to 0.3 so the influence of the IP adapter on the rendered images will be reduced you can play around with these settings and look what difference it makes I would also recommend using Aura node to further enhance the face ID IP adapter set the weight to maybe 0.6 to reduce its influence a bit then connect the IP adapter model output to the lower input and the lower output to the case sampler model input now we just need a positive and negative prompt to describe our scene just keep them as simple as possible it doesn't need that much let's render our first image the first run might take a bit longer because the models need to be loaded into the GPU memory but any further renderings should be blazing fast with only four render steps needed now here's what we've got looks perfectly similar to our input model doesn't it finally I want to add an image upscaler and refiner to improve the size and quality even further but first let me clean up the workflow a bit to make it less messy let's add a highest fix scale node and connect it to the vae image output I will set the rescale method to width and height and set the size to 1920 by 1080 pixels now let's duplicate our K sampler with alt drag enter a different seed value and connect the IP adapter model to the model input also connect the positive and negative prompts and connect the highest fix scale latent output to our K sampler latent input finally let's duplicate the vae decode node connect it to our second Cas sampler and the vae and connect the image output to the save image node that's it now let's render again okay that's not what we've expected expected in order to get a consistent image we need to turn down the denoising strength of our second Cas sampler to a lower value let's say 0.4 now let's render again that's much much better isn't it now we're done with the first part we can use different prompts and you will see that the generated images stay very consistent with our single input photo while the scenes are nicely following our prompts we've created our personal Avatar and can generate any scenes we like but that's just the first step let's go to the next chapter and take it much further than this wouldn't it be nice if we could turn our newly created Avatar into an animated 3D model that would offer us a great variety of New Opportunities we could place our Avatar into a 3D scene move around the camera and render it from any angle we like well that's possible and actually it's not even that hard so let's give it a try there's a free website called Avatar where you can create a quite realistic Avatar using only three photos taken by a webcam or via upload link down below first sign up with your email address or simply sign in with Google if you have a Gmail account once that's done you will be asked asked to scan a QR code with your smartphone and take some photos in order to create a 3D avatar that's not what we want in our case we would rather like to use the Avatar that we've just created with comfy UI still I guess we have to go through that process once before we finally get access to the upload images function so bear with me for a moment so pick up your smartphone scan the QR code with the camera then open the link presented and follow the instructions of how to create the Avatar images once that's done go back to your PC and you can see that an avatar of yourself is created still not what we want but we are nearly there I hope they will optimize that process in the future but overall it's not that difficult and it's worth it okay here we have the Avatar which actually doesn't look that good but we won't need it anyway hit the back button and on the upcoming screen you can see that there's a button called new Avatar click on it and on the next screen you will find the option upload form device click on it as you can see we need three photos of our Avatar one from the front one from the left and one from the right which should be in an aspect ratio of 3x 4 or 9x 16 so let's do this in comy UI first let's calculate our new image size it's 1344 ided 4 * 3 so the width will be 108 and the height 1344 pixels let's change the size of the latent image accordingly for the upscaler it's 1920 / 4 * 3 which gives us a width of 1440 and a height of 9 1920 pixels we could also do this with a math node but I'm too lazy for that now we need to generate the three facial shots of our avatar from different angles and for that we're going to use an open pose control net let's add an apply control net node connect it with the Positive prompt and connect the conditioning output with the Positive input of both K Samplers then let's connect it with a load control net model node and select open pose XL model now we need a load image node for the open pose image I've already prepared the required open pose images so we won't need a pre-processor to generate them I will provide a download link for these images so you won't have to bother generating them yourself but if you're interested in how to to do it I will leave a link to one of my previous tutorials where everything gets explained in detail we also also need to change the positive prompt a bit because we need a neutral background as shown at the urn website then let's render the first image let's save it to a location where we can find it then let's select the open pose images for the two other poses render and save them now back to Avatar just drag the three images of our Avatar that we've just created onto the respective input boxes and also select the gender in our case female then you can choose which type of Avatar you want to create we will just go with the regular one without face animations because we're going to use a different method for animating the face and let our Avatar talk then hit the submit button and wait for your 3D avatar being created once it's done select the right body type hairstyle and hair color then choose which kind of clothes and shoes your avatar should wear and we're almost done we could now export the Avatar as a glb file and animate it manually or with the help of mixo but there's even a better option if you don't need very sophisticated and complex animations just click on the animate button at the top right and select one of the pre-built animations and poses I will just make our Avatar walk now let's click on the download button on the bottom left of the main window and Export it with animation onto your local Drive we're finished with avat turn and in the next step I will show you how to use our 3D avatar in blender and then bring it back into comy UI so let's head over to blender and open a new general project if you're not familiar with blender it can be a bit intimidating at First Sight but just follow my steps and you will get along just fine I've activated screen keys because much work in blender is done by using shortcuts so in the bottom left of the screen you can always see what I'm just doing let's first delete the default Cube by clicking on it and pressing the X key then hit file import gtf 2.0 and find the glb file of our Avatar that we've exported from aurn zoom in and get your view into the right position and switch to viewport shading mode so you can see the fully shaded model when you hit the play button down in the timeline you can see that the animation has also been imported but it stops after a few frames let's fix that drag up the bottom window a bit and head over to the nonlinear animation editor click on the push down button to convert the animation into an nla strip then hit n to show the animation window on the right side expand the action clip section and enter a value of 10 into the repeat field now the animation will be repeated 10 times switch back to the timeline now bring the view into the position that you want to be rendered then from the menu bar select view aign view align active camera to view the Box appearing on the screen is the camera view now select the camera in the right window hit G and move the mouse to get the camera into the exact position which you want to be rendered now that's it for the Avatar but I also want to add a little background scene to make it look more interesting so let's head over to sketch Fab where you can download thousands of 3D models for free let's look for an alley then select the one you like click on download 3D model and download the model with the gltf format once it's downloaded unzip it and then let's go back to blender now let's import the model into our scene it's too small for our Avatar so hit s and drag the mouse to scale it up let's switch to wireframe mode so we can still see our Avatar although it's hidden by the alley move the model around and rescale it further until we have the right right position and size in case that's all a bit too complicated for you I will attach the final blender file to this tutorial link down below once you're satisfied switch back to vport shading mode and click on the little camera icon to activate the camera view now when we hit the play button in the timeline again you can see that the Avatar is working correctly but it always stays at the same position which makes it look a bit unnatural now we could tweak the walking animation to fix that but I'm going to choose an easier way by just moving the alley while our Avatar keeps walking so let's select our alley again which is the sketch FB model object in the right window switch back to wireframe mode and drag the alley to the desired starting position also make sure that you are on the first frame in the timeline then in the main window Hit N to show the transform window hover your mouse over the location parameter and hit I to insert a key frame then move to the last frame in the timeline drag the alley to its desired end position and key frame it again now the animation looks more realistic now before we export the animation let's add a point light to the scene by pressing shift a and move it to a position where the Avatar is well lit finally let me do some final tweaks with the camera and then let's render the animation click on the output button in the right window check the resolution and frame rate select the output folder and the file format in my case it's MP4 then in the main menu select render render animation and wait until it's finished so here we have our rendered animation which we will now feed into comfy UI as a videoo video input back to our comy workflow let's add a load video node and load our blender animation we will downscale the video just a bit so let's get an upscale image node and set the size to 1344 by 768 pixels next we bring the image into latent space so we need a vae encode node and connect it with our vae as we're creating a video to video workflow we connect the vae latent output to our K sampler latent input we also need to reduce the the noising just a bit say 0.8 so our input video gets some influence on the rendering and we need an apply animate diff model node to make it more smooth and stable let's also get a load animated if model node with an sdxl model next we need a use evolved sampling node connected by the M model and because our animation has more than 16 frames we also need a context options looped uniform node I will also add a free un node which improves the quality and colors without any further costs now connect the model output of our IP adapter Laura with the free U model and the free U model output with evolved sampling model input let's also adjust the free u values you may play a bit with the values but overall I think that my settings are good for our case now connect the use evolved sampling node output with the model input of our sampler I'm doing this via the reroot node unfortunately we need to get rid of our upscaler and the second Cas sampler because the video generation requires a lot of Hardware resources at a high resolution but I will show an easy way how to improve the video quality in the next chapter of this tutorial we also need to add a video combine node because the output will be a video not a single image and I will also add a RI video interpolation node to make the animation smoother just put it between the VA code and the combine video node let's also select the ri 49 checkpoint now to the control net I'm not going to use the open pose but the canny control net and since we need to extract it from our video frames we also need a canny edge pre-processor to generate the control net images let's connect them with the video frames output then I want to add a depth control net so let's duplicate the control net nodes and add a depth anything pre-processor I'm also going to reduce the control net weights to 0.6 to give more leeway to the other components let's connect everything properly and we are ready to render our animation first I'm going to do a test run with just 16 frames okay not too bad think we're nearly good for the final render now let's render the full animation depending on your Hardware it might take some time and maybe you will also need to reduce the video size via the upscale image node okay we ran into an error so let's try to solve it if our animation has more than six 16 frames we need a load Advanced control net model node the standard one doesn't work so let's change that and then let's render again okay here's the final animation it looks smooth and stable but it's still not perfect because we were limited by our Hardware resources so how can we improve our video there's a very sophisticated AI tool called face Fusion but to my knowledge it's not not yet a part of comfy UI but there's an easy way how to install and run it on your local machine have you ever heard of Pinocchio it's a free tool for running various AI applications in a virtual python environment I think it's extremely useful and there are lots of interesting free AI apps included even comy UI so let's download and install it if you haven't done so yet everything is explained very well on their website so it won't be a real challenge I'll leave a link down below once you've installed Pinocchio search for face fusion and download it you will then ask to install it so just do it and be patient as it can take a while to set up the virtual python environment once that's finished just launch the app that also might take a while at the first run of the app as some large models and checkpoints need to be downloaded I've already done it so it's going to be fast when it's done click on the link to open the app in your browser it's very simple to use just select the options you need in our case it's face swapper face enhancer and frame enhancer load a facial image of your Avatar Into The Source box and then load our comy UI animation into the target box if you've got an Nvidia GPU you should also select Cuda as it speeds up the rendering considerably you can play around with the other other settings but I think the default values are just fine now just hit start and the animation will render the face swapper tries to replace the face with the original Avatar Photo which takes it even closer to what we want the face enhancer tries to enhance the quality of the face even more and the frame enhancer tries to improve the overall quality now here's the final video you can see that the quality and the likeness of our Avatar has improved considerably and it wasn't really hard to achieve there's one thing left I want our Avatar to talk now there are several options I've already posted a tutorial about sad talker some time ago and there's even the lip sync option in face Fusion but I wanted to try something different so here we go we're going to install another app in Pinocchio called dream talk so go to the Discover section and find it in the list then download and install it again that can take some time once that's done just launch it and wait until the models are downloaded then click on the link to open the app in your browser it's very simple just upload your avatar image into the image box and an audio file into the audio input box any MP3 will do I've just created a short talking sequence with a standard voice in 11 Labs which is free up to a certain amount then hit run and wait until it's finished shouldn't take that long okay here's the result the lip sync is great but the quality isn't but we already know how to improve the quality don't we just download the animation and then let's head over to face Fusion again bring the Pinocchio app to the front and stop dream talk then hit the home button and start face View tion now again load our Avatar image into the source input box and the dream talk animation into the target box then select face Whopper face enhancer and frame enhancer and hit start and here's the final video hi welcome to my new tutorial in this video I will show you how to create your own personal Avatar out of a single photo now we're nearly finished but I've got another goodie for you which might save you some time if you want to try out my workflows down in the description of this tutorial you will find some download links with the workflows a zip file with the input images and videos and the blender file where I built the 3D animation there is also a zip file called download models which includes a little python script that will download all missing models and checkpoints used in this video and a Json file where these models are listed I've made a clean comy UR installation to show you how to get everything running from scratch because I know it can be a pain when you want to run a workflow written by someone else and some things are missing or just aren't working let's drag the first workflow into comfy and you can see that there are some missing nodes just install them with the comy UI manager install missing custom nodes as usual after installing and restarting all nodes are there but there might still be things missing I'm using an sdxl lightning model some luras and other stuff and usually you might need them download manually from different places and sites so I wrote a little python script that will do the hard work for you on zip it and copy the get models py and models. Json files into the comfy UI subfolder of your comy installation one level above the models folder then open a command window there and run the script by typing python get models.py when asked for the file name enter models. Json and hit enter then confirm the downloads and all the missing models will be downloaded into the right folders that will take a while so I'll be back when it's finished okay it's done now you can check if all files have been placed into the right folders let's take a look at the Json file here all down download infos are stored and if you want to skip some files just remove that section before running the script now unzip the images file you will need the control net images for the second workflow but you can also use my input photo and the walking animation if you like now let's upload the starting image and run the workflow if you get an inside face error there's something more we need to do let's let's get the inside phase 3 11 wheel from its GitHub page linked down below and put it into the comfy UI update folder then open a command window there and run the command that you can also find in the description that will take a while then restart comy UI and we're good let's now test the second workflow upload the control net image and run good now the third one install the missing nodes I found an error there don't install the inference core nodes as recommended as they didn't work with me but install the auxiliary pre-processors instead then restart comfy UI it will take some time to install these nodes so please be patient okay now upload the walking animation and run I've restricted the workflow to 16 frames in the load video node so it won't take that long for testing still it might take some time because some more models may need to be downloaded okay seems to work so I've covered everything I wanted if this tutorial was helpful to you please leave a like and maybe consider subscribing thanks for watching I'll see you in the next one

Info

Channel: Render Realm

Views: 1,115

Rating: undefined out of 5

Keywords:

Id: ROVNlsh5B7U

Channel Id: undefined

Length: 30min 43sec (1843 seconds)

Published: Sat Mar 16 2024