이미지를 말하는 영상으로! Dream Talk (그대로 따라하기, comfyui)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello. This is Neural Ninja. In this video, I will teach you how to use DreamTalk in ComfyUI. DreamTalk is a lip-sync technology released by Alibaba. It's a little less powerful than paid DID, but if you use it well, you can get pretty good videos. I will teach you simple basic usage methods and also teach you how to get larger images. When using the Colab site, please check the ReActor model and DreamTalk custom node. Please clear the workflow. First, add a DreamTalk node. It is a simple node that converts image and audio files by selecting them. Let's add a load image node. Let me load the image. I like the image to be frontal if possible. I'll also add a load audio node. I'll select an audio file. First of all, this is a short greeting MP3 file. When you refresh, you can select the audio file uploaded from the node. please select. I will connect the nodes. You can choose your emotional expression here. It seems like you can choose poses, but there is only one type. Please leave image cropping checked. Let's add a video combine node to create video files. The current frame rate is fixed at 25. For the video type, I will select MP4. Now let me convert it. It was created well. Since DreamTalk is created based on the 256 size, the picture quality is a bit disappointing. I will upload the quality later. First, let's connect audio to the video. Let's add an audio node and select a file. I'll try converting again. Let me play it. Now let’s improve the image quality with Restore Face. I will add a Restore Face node. I will connect the nodes. Let me choose a model. So far, I think this model has the best performance. I will adjust the visibility appropriately. It was created well. I'll try making it a little bigger. Let’s add an image resize node. I will increase it to size 512. Let’s connect the nodes and convert again. You can use basic DreamTalk up to this point. I'll try converting it to another audio file as well. This is a music file with a slightly different style. Let me convert it. Let me play it. It's a bit disappointing that DreamTalk was created by cropping only the face. Now, we will composite it with the original image and convert it into a larger image. First, let's reduce the original image to an appropriate size. We will use constraint images and cropped images. I'll make it an image with a width of 512 and crop it once more to make it an appropriately square shape. I will add a preview for confirmation. I will crop only the top part of the image using image cropping. I will add a preview for confirmation. The restored image will also be adjusted to the original cropped image size. I'll change Width and Height into inputs and connect them to the crop size. If you do this, it will be resized to fit the size of the face cropped from the original image. Now let's convert it. It was converted well. Now let’s attach the converted video to the original image. We will use the image composite node. I will select the original image as the target image and connect the converted images to the source image. Conversion takes place only when the number of target images and source images match. By placing repeat images, we will increase the original image by the number of converted images. The number of images will be obtained from DreamTalk Count. I will connect the nodes. I will copy the video combine node and connect it. It's a good addition. Now I will adjust the position. Please replace the X and Y values with input. Please connect Crop X and Crop Y. I'll try converting again. It was converted well. The location is correct, but the detailed location may look a little different as it is converted in DreamTalk. I will add a node to correct the position. Let's add two simple scalpel nodes. You can easily adjust the value. I will connect the nodes. I will adjust the value. Let's leave X as is and add five to Y. It was converted well. This change in position is applied quickly because it does not require large calculations. You can adjust it by changing it little by little. Now let's connect the border parts smoothly. I will use a mask to handle it. I will load the mask image that will blur the border. Just place this mask image on the face area. We will create a black image and position it exactly like the face converted to the image composite node. I will create an image composite just like combining the converted image with the original image. I will set the crop size and position. Let’s check it out with a preview. The surroundings are transformed smoothly according to the mask. Now let’s convert the image into a mask and apply it. You can also apply a mask using an image composite mask. Let me convert it. It was created well. The upper part is not easy to fit. I'll crop and cut it. I'll cut it to an appropriate 50 pixels. It was created well. I'll move it slightly to the right by about 2 pixels. I think it's true now. Let's convert other images as well. This is an error because the size of the resized face image is odd. Since the face image here is for verification purposes, I will resize it to 256 appropriately. It was converted well. Let's apply it to other images as well. It was created well. This time, let's apply a different emotion. Let's apply a surprise. Let’s apply Happy to another image. A frontal image is best created. Other angles can also be converted. I will convert it and check. As it was converted, not only the location but also the size changed a lot. Let's adjust the image scale by adding it. Let's change it to 90% size. I will convert it and check. The mask was not applied well. I will also change the mask size. It will be reflected in the size of the scaled image. Now let's adjust the position. I tried to match it as closely as possible, but it definitely looks lacking compared to the front. Let's reset the ratio and position and create it again. Lastly, let's apply other images and audio. Inpainting is a function that reorganizes the selected area of the image to match the surrounding area. If you create images using ControlNet in this way, you can easily create images with the same facial features as the frontal shot. If you create it this way, you can use the location almost as is without modification. So far, we have learned how to use DreamTalk in ComfyUI. I think you can get a pretty good lip sync video, even though it's a little less powerful than DID, which is a paid lip sync. There is room for improvement by utilizing Stable Diffusion. I hope the video helps. I'll come back with a good video next time. thank you
Info
Channel: 뉴럴닌자 - AI공부
Views: 2,975
Rating: undefined out of 5
Keywords:
Id: uTuMiD2eBXI
Channel Id: undefined
Length: 21min 54sec (1314 seconds)
Published: Mon Apr 08 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.