Story Diffusion - Create Comics and Long Visual Stories with AI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I am going to present you yet another outstanding project today with the help of this story diffusion project you can create a magic visual story instead of telling you let me first show you some of the examples look at this video this video has been generated by story diffusion it is just a frame of the story but look at the Quality and realistic image look at the expressions of the people and the way they are moving it's all AI generated and these are not real people all of them look at this especially this left hand side one this is a typical movie scene and this looks so realistic this has all been generated from images look at the Expressions look at this one amazing amazing stuff look at these scenes this is underwat scene but look at the surroundings look so real you can even generate cartoons with it and then you can also generate comics in a story format so if you are into the comics then you can see how realistic and how good quality this thing looks amazing stuff now coming back to this project as I said story diffusion can create a magic Story by generating consistent images and videos there are two parts to it first is consistent self attention for character consistent image generation over long range sequences it is hot plugable and compatible with all SD 1.5 and sdxl based images diffusion models I will also show you how you can get it installed in the cab so stay tuned and we will see this consistent self attention for the current implementation the user needs to provide at least three text prompts for the consistent self attention module and it seems that recommended is at least five to six text prompts for better layout management also um motion predictor for long range uh video generation which predict which predicts between um different sort of stuff like motion between condition images in compressed image semantic space achieving larger motion prediction how good is that and then there are lot of other examples for example if you look in this one this is simply taking different images and then combining them to create a video leveraging the images produced through their consistent self attention mechanism they can extend the process to create videos by seamlessly transitioning between these images this can be considered as a two-stage long video generation approach also what you see here in this demo these are highly compressed results for the speed and you can visit their project page for detailed videos and these are not the only examples I mean if you go through their website they have plenty of them look at this one how real this looks looks bit grainy but this as I said this has been compressed for the feed here look at these cartoons and then there are few others there you go how good that looks the good thing is that they also has um demo if you don't want to run it by yourself you can simply go to this demo at hugging Quest spaces and run it and here there are just giving some textual description some Japanese enemy you can select different enemies like cinematic Disney character photographic comic book whatever and then they have set some of the random parameters and then here you can just set them out all you need to do is simply click on generate and it is going to uh generate something right now it is waiting for GPU to become available because it is running on zero at hugging face so which sometime takes bit of a time to grab the GPU so let's wait for it to finish to see what it creates it seems that it was there is a high load at the moment so it it won't be able to grab the GPU that is fine let's try to uh you can keep trying and eventually you'll get the GPU and it will generate it but let's try to get it installed on Google collab so this is my Google collab runtime is change runtime to T4 GPU I'll try to see if it gets installed on this or t 4 GPU is bit limited but let's see because it is as you can already tell it is bit of a heavy sort of stuff okay so let me first import some of the packages let or maybe we'll just install the requirements first after after cloning the repo so I'm cloning the repo here and I will drop the link to this repo in video's description so let's wait for it to get cloned repo is cloned let's CD to that directory which should be simply story diffusion I just need okay so it is good it's always good idea to put this um semicolon here okay it doesn't work anywhere so I'm already in that story diffusion which is good and you can check it with bwd present working directory okay that is nice and now let's install all the requirements let's run the requirements and it is going to install a lot of things so let's wait for it to finish meanwhile it installs the requirements let me give you a very quick overview of its Pipeline and architecture so the pipeline of story diffusion to generating subject consistent images is what you can see on the screen to create subject consistent images to describe a story they have Incorporated their consistent self attention into the pre-train text to image diffusion model they have splitted a story text into several prompts and generate images using these prompts in a batch consistent self attention builds connections among multiple images in a batch for subject consistency and this is how simple that is and this whole image um of their method for generating transition videos for obtaining subject consistent images is described here and to effectively model the character's large motion they encode the conditional images into the image semantic space for encoding spatial information and predict the transition edings these predicted uh predicted embeddings are then decoded using the video generation model with the embedding serving as control signals in Cross attention to guide the generation of each frame and that is what the result to it and by the way whole this whole source code which I just showed you is licensed under MIT requirements are still being installed so let's wait for this one to finish all the prerequisites are done let's now import the libraries and there are lot of them as you can see I'm not sure even if this T4 GP would be able to sustain this but no harm in trying let's wait for all of these to get imported so I have tried it few times but it seems that there are a lot of issues with the GPU uh on the free tier so I'm not going to fight with this but if you have access to the power F GPU you can definitely run this code and you can access this code from their GitHub repo and I will drop the link in video description but that's it all in all a really good project very impressed by it and the way it is creating these videos simply awesome let me know your thoughts if you like the content please consider subscribing to the channel and if you're already subscribed then please share it among your network as it helps a lot thanks for watching
Info
Channel: Fahd Mirza
Views: 2,659
Rating: undefined out of 5
Keywords:
Id: EiettkcG6fg
Channel Id: undefined
Length: 8min 56sec (536 seconds)
Published: Fri May 03 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.