AI Generated Videos Just Changed Forever

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

- [Will] That's hot. (bright upbeat music) - All right, so this is simultaneously really impressive and really frightening at the same time, and it's hitting me in ways that I didn't really expect. So do you remember Will Smith eating spaghetti? Do you remember when this was what AI generated videos looked like? Remember when we said, "Okay, this AI stuff is cool and all "but clearly there's a long way to go "before there's any need for concern." Well, welcome to the future people because this is also an AI generated video. And so is this, completely synthesized out of thin air by computers. This one too, this is not real. Absolutely ridiculous how far we've come in literally one year. This does feel like another ChatGPT, DALL.E moment for AI. And maybe I'm overreacting because, okay, I am a video creator, so an AI that's actually doing my job, maybe that feels a little more threatening so I'm particularly impressed by it. But also this stuff is really good. So today, Sam Altman and OpenAI announced a new model called Sora and it can generate full up to one minute video clips from just text input. So the same way DALL.E was able to understand our text input and turn it into a photorealistic or stylized image or whatever you want, same thing with Sora but now since it's videos, it also needs to understand how all these things like reflections and textures and materials and physics all interact with each other over time to make a reasonable looking video. And of course, right away, there's a bunch of examples on their website that are crazy. Now, before I show you these, I just need you to keep this in mind, you're about to watch a bunch of AI generated videos and you know that you're about to watch a bunch of AI generated content. So your brain, you're already looking for this stuff and it's not perfect, you will find imperfections, but not everybody who sees AI generated content on the internet knows to be looking for that. So also keep that in mind. This is also the worst that this technology is going to be from here on out. So, okay, here's one of the videos. There's no audio to any of these clips, but the prompt for this one is a stylish woman walks down a Tokyo street filled with warm, glowing, neon and animated city signage. She wears a black leather jacket, a long red dress and black boots. This video is already miles ahead of where we were. It has accurate lighting, it has materials, it has skin tones, movements, even has reflections all over the place. Now, of course, if you look at it for more than about 10 seconds, very closely, there are lots of giveaways. Like this dude in the background kinda looks like he's gliding in a weird way. The frame rates and the reflections in the water are for some reason lower than the rest of the video. The camera movement overall is just a bit inconsistent and it just, I don't know, it just kinda feels a little bit off. But then again, this is where we were one year ago. So just keep that in the back of your head for all this. Okay, how about this one? This is another one which has a long prompt about a camera following behind a white vintage SUV with a black roof rack as it's speeds up a steep dirt road. This is also, again, really good. It kinda looks a little more video gamey because of how rock solid the drone footage is, but clearly very usable. Here's another one, a litter of golden retriever puppies playing in the snow. Their heads pop in and out of the snow covered in it, it's so good. It feels like the physics of the fur and the ears and everything with the snow flying around in slow motion is incredible. I've looked through all of the sample videos on OpenAI's website, and clearly these are the handpicked best ones that they chose to share where they just put in some text and then get a video and don't modify it. But there's really impressive stuff in there. Some of it has humans, some of it doesn't. Some of it is more realistic feeling like the truck driving one, but some of them are more video gamey or more stylized. A lot of it is slow motion, I just have to say how insanely fast these models are improving is genuinely, like that's the shocking part. Like I remember not even that many months ago, DALL-E 3, really, really high end, and you could always still find something off about it. Like especially if you ask it for something like a photorealistic image of a human, something about like the hands or the ears would always just be a little bit off, nevermind the physics. But even this video here is crazy at first glance. The prompt for this AI generated video is a young man in his 20s is sitting on a piece of a cloud in the sky reading a book. This one feels like 90% of the way there for me. Like it's beyond the uncanny valley of like apple's personas, which are actually based on humans. This is a made up person. I mean, his eyes are kinda weird, and the motion of the pages in the book are kinda odd. And yeah, obviously, he's in a cloud and that's a giveaway but like, the lighting and the shadows and the skin tones and then all the realism of the textures on the shirt and the way the shirt and the pants move and the hair, they're all really impressive. And then for this one, they typed in a movie trailer featuring the adventurers of the 30-year old spaceman wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35 millimeter film. And the closeups of his face, the fabrics on the helmet, the film grain through every shot and the cinematic style, this is one of the most convincing AI generated videos I've ever seen, minus maybe the weird physics of that dude walking kind of in fast motion. So Sam Altman, if you follow him on Twitter, he's going through a whole bunch more of like people's requests and posting a bunch more generated videos. And so if you wanna check out his profile, you can see those. But here's the thing about these AI generated videos now, as good as they've gotten to this point, they can and will pass as real videos to people who are not looking for AI generated videos. Now that is obviously insanely sketchy during an election year in the U.S. and also terrifying for a bunch of other internet related reasons but it's also perfect for stock footage. Like there are already all kinds of presentations and advertisements and then PowerPoints that are in need of oddly specific stock videos. And these AI generated videos are already good enough to 100% pass for that purpose. Like look at this one, this one with the waves at Big Sur, this drone shot. Honestly, if I saw this on Twitter, I wouldn't even think twice. I'd be like, "Oh, nice drone shot, dude." Wouldn't even think about AI if I wasn't pixel peeping at like the way the water was moving. Like this is a totally usable video in an ad for some California based product. And that has all sorts of implications for the drone pilot that no longer needs to be hired, for all the photographers and videographers whose footage no longer needs to be licensed to show up in that ad that's being made. It's already that good. There's other stuff like this wall of TVs, which would be a totally expensive and difficult thing to shoot with a camera and all these old expensive props, but if you can just generate it this well with reflections and the environment and everything else around it, I mean, why do it any other way? It's also very capable of historical themed footage. So this is supposed to be California during the Gold Rush. It's AI generated but it could totally pass for the opening scene in an old Western with the right music over it. How long until an entire ad, every single shot is completely generated with AI? Or what about an entire YouTube video or an entire movie? I'm tempted to say like we're a long way away from that because you know, this still has flaws clearly and there's no sound, and there's a long way to go with the prompt engineering to iron these things out. But then again, the spaghetti was like a year ago. Now actually like that OpenAI, on their website, they show some of the downfalls too of this particular model. And because who would know better than the people who have been using it? This is a very private tool, by the way, right now. It's in super limited access, so it's in the hands of red teamers, which basically means people testing it, pushing the limits, trying to break it, and a few trusted creators. But they have found plenty of weird edge stuff. Like this clip here of a bunch of gray wolf pups looks normal at first but then it's pretty clear that something's kinda off with the way they're just kinda appearing out of nowhere and walking through each other. That's kinda weird. Or this clip of a guy running on a treadmill, which I mean, I don't really have to say much more about why this one is weird. But this is my favorite one, again, so again, just try to put yourself in the mind of someone who's not expecting AI. You're just scrolling through Facebook or Twitter or something, right? So you just see this video. So first I just want you to watch this clip as if it's just a stock video you found of a grandma celebrating her birthday. And just try to think like, I wonder what birthday she's celebrating, right? I don't know, how old do you think she is? 60? 65? Maybe it's the big 70. She seems to really like that cake. Now, did you see it? Did you catch that? I'm gonna play it again, but this time, watch the video knowing that AI generated photos and videos have trouble accurately doing hands. I'll play it again. And now it feels super obvious like every time you watch it, watch a different set of hands, it gets weirder and weirder. You can watch it like five times and there's dead giveaway after dead giveaway, not even mentioning the weird inconsistencies with the direction of the wind on the candles. But even as I'm saying all that, even as it's coming outta my mouth, I can't help but remember that 12 months ago we were critiquing this. (Will laughing) So what does this all mean? Well, I mean, there's what it means now and there's what it means for the future. Now, Sora, this thing that they've made is clearly a really impressive video generation AI tool that is both going to fool people and also be very useful. There's also a watermark in the bottom corner of every video generated by it. So if you see one of those videos and ideally it hasn't been cropped out, then that's at least a pretty clear indicator that it's AI generated. It's a Sora video. But also, I do think they're gonna have to be very careful with this, they're gonna have a whole bunch of safety stuff to keep in mind. I think they'll probably have to be even more safe than DALL.E. Like you shouldn't be able to generate people's likenesses. Like you shouldn't be able to make a politician look like they're doing something on video, especially this year. You probably won't be able to make Will Smith eating spaghetti, but it also definitely means stock video generation is absolutely going to take a dent out of video licensing. Like I can basically guarantee that. Like logistically, why would anyone making something pay for footage of a house in the cliffs when they can generate one for free or for a small subscription price? Like that is the real scary part of what this tool implies. But in the future, it gets pretty existential, man. I mean, okay, if this is trained on all videos that have ever been made by humans, then surely it can't be innovative or creative in ways that humans haven't already been, right? I don't know. Either way, I'll have all the links below for all the Sora stuff, for OpenAI stuff, and I guess I'll talk to you next year when we look back and go, "Remember that first version of Sora "and how bad those wolf pups looked "when they spawned out of nowhere?" Just remember, this is the worst that this technology is going to be from here on out. Thanks for watching. Catch you the next one, peace. (bright upbeat music)

Info

Channel: Marques Brownlee

Views: 8,059,410

Rating: undefined out of 5

Keywords: AI generated videos, AI generated, AI video, MKBHD

Id: NXpdyAWLDas

Channel Id: undefined

Length: 12min 2sec (722 seconds)

Published: Fri Feb 16 2024