Stable Diffusion ComfyUI & Suno AI Create AI Music Video On Our Control

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone we have talked about animation generated with stable diffusions animate diff and SVD we have also talked about AI music generators previously in this video I want to talk about another method that we can create music videos with whatever tools we have on our local PC but before that let's be patient and check out this tool because it inspired the following workflow for creating music videos so this noisy AI it's kind of like creating music videos using AI models and wow when I saw their introduction videos on this website I felt like wow is that really something that we can type a text prompt and it will generate the music video for us and the quality of that look at this girl it looks like AI generates very good consistency like you saw in the AI model Sora but when I go to their Discord hit this button try noisy AI you see I'm kind of disappointed about this stuff like remember back in the days in the year 2000 when all the social medias were booming in the market and eventually there were a lot of little apps doing social media aggregators you know people connecting Facebook API and Twitter API and then they tried to mash up that data together in those apps right and eventually where are those apps right now they're gone right mostly they are not existing anymore and I'm seeing that you know the AI industry right now is starting to see this kind of stuff because when you look at their Discord right now you can really see they are mashing up different AI models they are not actually doing their own training models or creating a workflow using everything from their company to do that except for this one this generate Clips channel so when you type some prompts here for example my example right here I typed a text prompt saying robot dancing in Disco and eventually I received this generated result and very obviously it looks like something coming from stable video diffusions output I can't say 100% it's from SVD but it's something with that kind of style and when I click other generated results from other users it kind of feels like that as well and you know those very famous deformed hands from SVD and picol laabs older versions you see those kinds of artifacts from diffusion models doing that because they don't have enough you net contact size to generate consistent hands and fingers around the videos and that is happening in this AI model or what they are calling this Discord server and look at this one it's just a camera panning something that I feel boring in AI video generated results because there's nothing the AI model is actually doing and you know it's really kind of disappointing for me when I see their website I was kind of excited like wow this is so realistic you see this woman floating on the water and very consistent and stuff like that but then when I click into their Discord and now let's check out their generate MV which is what they are marketing so hyped about creating music videos using AI but then when you see this kind of stuff it feels like garbage because yeah let's be honest I can say this is like garbage because this the music of these videos is not generated by this Discord first of all and it has to be provided by the users so for example this user is getting the links from zuno from maybe this user generating another music clip in zuno and then he referenced a link for this Discord bot to capture those music MP3 files whatever it is and like this one also capturing the music URL from udio and using that music file to generate each scene with they also have bracketed some timeline in here for the one minute Mark and what ever it did and this guy put the lyrics in here and you know overall every generated scene from this Discord bot I cannot say this is an AI model this is a Discord bot that is maybe controlling some other diffusion model maybe they have SVD behind that and running that workflow and generating each scene and putting that output into one video file and pulling it back to the users as the response because first of all the lyrics and the video content do not match basically it is look at this one the lyrics in here are close your eyes baby and you know these lyrics should be doing something about love or somebody that you care about but when the generated result is like it doesn't make sense what what happened to the street in here and all the graffiti is not about the gangster right and what happened to do the space here it doesn't do something about space with these lyrics right so if you are doing music videos you have to you know we all see music videos a lot right like you listen to the song and maybe the singer has a role a b roll where the singer comes and does the background story of the music or the songs something is happening in the b roll something like that but in here it does not do anything to represent that and that is clearly just based on what the users are providing in these keywords and the AI is separating each keyword line and putting that into a text prompt and generating each scene from the text prompt so when it says falling down there are some Skies falling atmosphere in this scene something like that so I can say it's totally garbage I was trying to review this noisy Ai and hope this was good when I saw their website but you know I just cannot believe that it was coming from the music that users are providing in here a YouTube link and it's just using whatever it has in the this YouTube content and generating some scenes that look like stable diffusion stable video diffusion like Styles and stitching it back together it's you know not impressive for me to see something like that and because it's kind of a loss of misunderstanding when you see from their website here you say turn your favorite Melody into music videos I thought it was going to be something that they can generate the music for me or you use the music that I have and turn it into like a stunning music video actually it's you know something music video should do rather than putting all the cutcenes of the camera panning like that and stitching all these scenes together and looking like that is a music video right so I would say forget this one and let's go make our own workflow in comfy UI we can do something similar or even better locally more control on the video content we want to generate and we can even create a character for the stories of the music videos so let's get started in here okay so I'm going to use an existing workflow that I built previously and have talked about in a video tutorial on my channel before and I have this one as you guys remember there's a large language model working on the content and transforming it to stable diffusions text prompts creating scenes so therefore we can use that for stable video diffusion and create some motion animations with that and I will be just restricting this to under 3 seconds 4 seconds length for each scene I don't want too much of that because that's going to be in the music videos Boll you know showing the stories of what happened that is related to the music and the lyrics and also I'm going to use the stable diffusions anime diff video to video workflow that I have built recently and I'm using differential diff Fusion to create styles from IPA adapter and the reference source videos as you guys saw some of my recent videos on YouTube shorts and you guys can check out videos in the previous tutorial here so check out the one that uses llama 3 to interact with stable diffusion we create something like that with this one and this is the workflow that is in that video and also we have this video for differential diffusions latent noise that's the workflow coming from here but then then I modified that right now there's like an updated version of that so basically it's enabled you change the character and also the backgrounds of that just like previously we have did the Rave plus animate diff workflow but then that one was consuming too much processing memory and power so therefore I have cut it down and rebuilt that to this lightweight version of an animate diff workflow but then also this is similar to the workflow that I have submitted contest on the previously on open art and yeah so we are going to use these two workflows and build something with you know doing the music video animations and we are going to use sunno AI for creating music now I have one already generated right here and previously I have another video talking about AI music generation that was with udio and I found out that this is pretty fun to play with but if you are you know creating music with way better quality songs sunno did better in that one rather than funny music and all those soundtracks udio is a winner for me as funny soundtrack generator so this is the song that I'm going to make into a music video and let's check out how the music [Music] sounds Al searching for a place I can go home footsteps the the [Music] without okay so that is the music that I just generated in zuno AI so it's about an R&B Style song and it is focused on the sadness of a long longdistance love so I can you know see what kind of things that we can put in the emotions for the visuals of this song and we can play around with that so again I am in the art list and from my subscriptions I have downloaded some people singing so some singing scenes that represent like the a role of the singer and I'm going to use all these video clips and that will be run through the videoo video stable diffusion animations workflow and that will be transforming it to another style look and maybe we just only need the open pose to do the pose of the characters moving how the motions and everything are going to be reskinned into other styles and also we are going to use stable video diffusion to create the Love Story scenes for the b-roll and that is going to be creating prompts using llama 3 and we are going to bring the lyrics and transform it maybe with chat GPT or any large language model to make these lyrics in into stories and more descriptive stories of each scene and then we can bring all that content into the large language model here and we are going to connect with llama using the fine-tuned Llama 3 models to get the stable diffusion prompts and then we will generate each scene into an image and let it run as video diffusion for the Motion Graphics and we've got to combine the a roll and b roll together at last in cap cut so that will be a very easy simple way to do video editing in cap cut and we can start there so let me generate all the scenes in here in animate diff and then we are coming back later all right so we have all the videos using anime diff and generating newer style video scenes for all the singing a roll so I am having all the same scenes previously I downloaded looks like I have the library in here so here we have all the scenes downloaded and processed with another newer style with you know an AI singer and we can continue with the other b-roll scenes which is going to happen in SVD so I'm going to use stable video diffusion to create the b-roll scenes and that is also taking my content from llama 3 so I have my llama 3 here so right here I have told the AI large language model to be creative and write scene descriptions for the following song that I wrote with the lyrics you are going to describe each scene for the music video so I pasted the lyrics here and I got all the descriptions for that and that is good enough for me to create the stories with stable video diffusion because I have the AI prompts and using that to connect with my local llama I'm going to use my local installed llama 3 fine-tuned model and that is going to generate each scene for me here and and if you want to check out how to use the large language models connecting with stable diffusion in comfy UI check out the video that I mentioned Link in the description below so I'm going to fast forward this part and I'm going to bring all this content into comfy UI once I generate all the scenes for the music video then I'll bring it to cap cut and we'll start importing that and editing each scene with the music itself so I will fast forward this part because cap cut is very very easy you don't need tutorials to do that I believe okay so everything is finished recording and check this out it looks okay some of the scenes I cannot use because some like this is deformed I have not monitoring that while the stable video diffusion was generating I just batch input all the scenes and batch generated so after all the results in here and I filtered which ones are okay to be used and some of them are not good like this one is not good to be used the character is deformed and this one was good but then in this second scene here it's not good so we have to give up some scenes here so there you go this is the video let's check it out in I alone home searching for a place I can't go home footsteps EO in the sun around the Wonder around without a destination to be found through the mountains that climb so high under the fast High under the fast all Sky the wind Whispers secrets in my but it's the Solitude that I hold de I'm the Wonder Ro WS drifting through the night with my and the world so fast yet I'm still small the drink so to no [Music] call so yeah it's pretty easy to to just you know put all the scenes back together and put your AI music in and there you go you got your own even better quality than just generated in the noisy AI models because you got more control over how the scenes are arranged whatever effects you want to put and the transitions between each video clip here and yeah something that will be improved if I put more time in and if I can you know do more research on how to get better lip syncing and I will be putting like this part where the a roll cuts into the singer the mouth this part I will be using lip sync if I have time to do that and also like this part if the music is going to be synced with the singer right here then that will be even better so yeah this method by using stable video diffusion and animate diff combining together and using any AI music generator you don't have to just use the zuno AI or Uno whatever you prefer that is your desire your business it doesn't matter you know some people argue oh that one's better music this one's better I mean who cares right if you prefer that one then use that one so that is it for this tutorial on how to create AI music videos that are better than something you see here I mean you got nothing to control just very plain keywords and you know try to have some luck generating something that is going to be matched with the music that you uploaded I mean that is not something that I want if I do a music video right so that is it for this video and I hope you guys get some inspiration I'll see you guys in the next videos have a nice day see you
Info
Channel: Future Thinker @Benji
Views: 2,450
Rating: undefined out of 5
Keywords: ai animation, ai video maker, ai art, 3d ai animation, make money with ai animation, i made gta 6, AI music videos, Noisee AI review, music video creation, AI models, Stable Video Diffusion, Comfy UI, DIY workflow, AnimeDiff, DiffAnimate Diffusion, Sunode AI, AI music generation, high-quality songs, creative journey, visually captivating music videos, ai animation workflow
Id: x4bG1DlpvGo
Channel Id: undefined
Length: 18min 8sec (1088 seconds)
Published: Thu May 09 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.