I Spent 150h+ Making My Own Anime With AI

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

are tons of research and experimentation weeks in the making and easily 150 hours worth of production we're finally here before even thinking about my style and story of the anime I was researching how classical animation works and what I can learn from it I picked up quickly that rotoscoping is the way to go for my case this technique can be replicated in AI image to image tools I've mostly used stable diffusion in this project because it gives me the most fine-tuning abilities it lets you apply a styling from a model to a given picture here is a stupid low quality pick of me transformed into this cool looking anime style if you know how image diffusion Works however you might have already realized there is a big problem right in front of me so how does image diffusion work well in simple terms diffusion has a model which is like a reference sheet for how things look this can be very specific like an anime model or even a model of only Single Character or something much more broad you then take an image put a bunch of noise over it and let the AI figure out how the image is supposed to look according to its model and your description The Prompt it then denoises the image with that information so depending how much noise we put on the image we get different results very little noise means we stay really close to the original a lot of noise means we get further and further away from the original image but look more like the style the model has been trained on finding that balance can be very challenging as I later found out but that's not even the real problem here if we look at an anime there is movement for a character speaking or running or fighting or whatever but the rest of the image is still because well it's drawn if we turn our input video into an image sequence put noise over each frame and then interpret these images we will get slightly different results for every single frame so consistency clearly is the big challenge of this whole process and can consistency is absolutely key if you want to create something that is visually pleasing we will be applying more than a handful of techniques to end up somewhere really good but first we need a story and I need to thank the sponsor of today's video which is also the first sponsor I ever had on the channel there is no question that AI artists have officially entered the game and they have even won art competitions already now that computers compete against humans it begs the question what this means for existing human-made art I believe we are at the transitioning stage between pre-ai and post AI art and I think that human-made will be a high quality special label that people are willing to pay extra for just like it's the case with the handmade label if anything pre-ai human-made art will become even more valuable one further fuel for that claim amidst the tough economic situation we are in and AI image tools blooming One Market still had a record-breaking year even in 2022 art which if you're a billionaire was a Saving Grace but most of us could never have gotten access to the high-end Art Market and Market you can now access in minutes without needing millions of dollars thanks to Masterworks art investing platform Masterworks has sold over 45 million dollars in art with the proceeds going straight to their investors and that's not a one-off every Masterworks exit to date has returned a profit with over 700 000 users Masterworks offerings have sold out in minutes they even had to make a wait list for new users but I got special access to skip it just click the link in the description right now I was thinking about my anime story and somehow The Matrix instantly came to my mind I don't really know why I haven't seen a movie in like a decade but it just seems like such a good place or such a good setting for an anime adaptation so I drew myself some storyboards already which you can see here I hope there are at least somewhat in focus I chose to redo the iconic red pill and blue pill scene with an alternative ending in which Neo which is not Neo but me instead swallows both pills and then reveals that he is the one the one who created The Matrix huge plot with big payoff which I knew I could support really well with another diffusion Technique we will get to later Chad GPT might have also assisted me here you will never know and for a while I really felt confident that this is the way to go not really realizing I'm picking up problems left and right first of all you can pretty much do anything in animation why did I chose to do a dialogue scene realistic mouth movement and being able to display a character's emotions is super difficult and secondly I just arrived in Thailand so I have zero friends here so I needed to play both characters myself if you look at me I don't really look like the badass that is more voice this whole process felt a lot like coding it was pure non-stop problem solving what seemed impossible for a while always worked after some time with enough experimentation and thought put into it so this is my apartment situation right here in Thailand and as you can see there is not a whole lot of space to do anything I got my bed right here and then got a little desk right there at the end but what we can work with I think is this area right here I have this big ass wardrobe which could serve at least as a somewhat neutral background and I will shoot all my scenes in front of that I also only have this one single pancake lens that I use right now which has a fixed focal length and only one simple single light so we will have to figure things out and I did I used the Wardrobe and then ran my scenes through Runway ml's background remover fantastic AI tool for those who don't have a green screen available this way I had a separation between subject and background and was able to use a still image behind my characters at this stage things were honestly looking pretty great until I realized the diffusion technique I decided to use won't work temporal kit gave me the great consistency I was looking for because instead of diffusing one frame at a time you put noise over 9 or even 16 of them at once if I have a total of 1024 by 1024 pixels for my whole image that doesn't give me much pixels for each of the 9 or 16 frames and I wasn't able to go for a higher resolution because these tasks are really computationally intensive so I just wanted to start shooting the scenes that I drafted in these storyboards because I mean while I still haven't really figured out the how to make the consistency of the animation work I still figured why not start shooting with the final scenes now because I don't really want to continue experimenting with just stock footage if the stock footage is not what I will use in my final video so I got myself ready for the shoot I'm already even wearing the shirt that I will do in the final scenes but then I realized I can't even do it because I accidentally ate part of my props yesterday I ate the blue pill it was just gum and I thought I had more of it but I didn't so I have to go now get new ones and then we will make this thing work I'm back and I got these they're not really pills and I also lack a lot of color but spoken like a true professional I'd say we will fix it in post this is what we got now I put the stuff in front of the Wardrobe as I planned and I will also change the seats depending on what I shoot like this and on the other side we have a little bit of light from here and then we have this little light here as well probably not the best Hollywood quality setup you could get but it will do the job oh I almost forgot to tell you that I'm an incredibly talented actor [Music] in all honesty this part of the process felt so uncomfortable I rushed all of my scenes I did the whole shoot in like 20 minutes and probably could have gotten even better results if I took more time but at this point in the process I was kind of discouraged I felt pretty down because I wasn't getting the results that I wanted in terms of consistency and I already put so much time into the project so yeah as I said I felt kind of bad at this point but this all changed one morning I just watched an amazing video that almost comically solves all of my problems at least that's what I think I still have to try it out but I do have a game plan now the video I was referring to is did we just change animation Forever by Corridor crew I changed a few things they were talking about but they honestly gave me a blueprint for my workflow first of all I trained my own models for my characters to get the best consistency I did a lightweight version of what Corridor crew did by creating my own loras in dreambooth I took 40 pictures of me with different lighting but the same shirt I wore during the shoot converted them all into 7068 by 7068 pixel squares and used the Koya Laurel Google collab notebook to train my model Google collab is great and I wouldn't be able to do any of this if Google collab wouldn't exist because what it allows you to do is basically rent a better computer for a limited time in my case it was 50 hours of computing with a way better graphics card for about 10 bucks needless to say I used all of my 50 hours I then trained two more models both for my more voice character ones with pictures of me wearing the hoodie and glasses and then a second time by using images of the real norfolks what this allowed me to do is combine these loras in my prompts and give them a wait I was now able to recreate a more voice character not the real Malfoy's but someone who is somewhere between me and the real more voice at least most of the time you can see how the pose is pretty much perfect in my images I achieved that by using another tool that Corridor crew didn't use I think it wasn't available at the time it's called control net and it's incredible it's a neural network structure to control the fusion models by adding extra conditions it analyzes your frame in different ways and the results are made available for the denoising I mostly use the combination of Kenny HED and sometimes depth open face or open pose I of course didn't run it locally but instead in another Google collab using warp fusion this gave me extra consistency between frames and didn't have the low quality I was facing with temporal kit it did come with some problems like the skin tones changing drastically but I was just gonna live with that and hopefully be able to adjust it in post after many tries of changing the prompts and parameters for each image sequence I got what I wanted I mean look at those hands it's clearly working oh well let's say it worked most of the time what what gets the result to the next level is post-processing in DaVinci Resolve so far everything looks super flickery but that's about to change I just copied the corridor crew workflow here so the next thing we need to add is the flicker there's two different modes one for time lapse and one for fluorescent lights and if we turn on the fluorescent lightly flicker check out the difference much better and if it's not good enough here's how I fix it copy paste paste paste next to the flickers I also added dirt removal and keyed away the green screen and because we are doing an animation we drop the frame rate from 24 to 12. for the last scene I then used the Forum another stable diffusion extension which lets you create really trippy animation it was perfect for what I wanted to achieve now that I had everything on my timeline I was ready to do the voiceovers I'm finally doing the voice over now which I have procrastinated for a while because it's very uncomfortable I dubbed my two characters and use metavoice to change the voice of my more for his character I also combined it with adobe's audio enhancer AI to get an even better voice the results were kind of mixed here I think it had a lot of problems replicating any of the emotions that I did in the voiceover listen to this example how how is that possible how's that possible this is boring but at least the regular speaking parts were nice there is no turning back and then last but not least I put everything together and made it work in Final Cut I used moving backgrounds like you see in anime a lot I put noise on the frames to make them look more interesting sharpened edges did some color styling and used effects like this one to hide broken parts of some animations 150 hours of work for a 60 second animation from a non-animator and here it is this is your last chance there is no turning back you have to make a choice take the blue pill and the story ends take the red pill and I show you how deep the rabbit hole goes I am the one how's that possible let me show you how deep the Reverend goes [Music] foreign it feels so real yeah hands are real it feels like it feels like I'm in a YouTube video and the video is about to end so I should say subscribe to the channel because the next few videos I'm working on will have some really cool AI stuff that I'm making also if you're wondering why I'm in Thailand check out this video next it explains my digital Nomad journey and you might end up wanting to try it out for yourself thank you for watching [Music]

Info

Channel: Till Musshoff

Views: 105,802

Rating: undefined out of 5

Keywords:

Id: UiQKiSRzXqg

Channel Id: undefined

Length: 13min 49sec (829 seconds)

Published: Sat May 27 2023