- [Will] That's hot.
(bright upbeat music) - All right, so this is
simultaneously really impressive and really frightening at the same time, and it's hitting me in ways
that I didn't really expect. So do you remember Will
Smith eating spaghetti? Do you remember when this was what AI generated videos looked like? Remember when we said, "Okay,
this AI stuff is cool and all "but clearly there's a long way to go "before there's any need for concern." Well, welcome to the future people because this is also
an AI generated video. And so is this, completely
synthesized out of thin air by computers. This one too, this is not real. Absolutely ridiculous how far we've come in literally one year. This does feel like another
ChatGPT, DALL.E moment for AI. And maybe I'm overreacting because, okay, I am a video creator, so an AI that's actually doing my job, maybe that feels a little more threatening so I'm particularly impressed by it. But also this stuff is really good. So today, Sam Altman and
OpenAI announced a new model called Sora and it can generate full up to one minute video
clips from just text input. So the same way DALL.E was able
to understand our text input and turn it into a photorealistic or stylized image or whatever you want, same thing with Sora but
now since it's videos, it also needs to understand how all these things like
reflections and textures and materials and physics
all interact with each other over time to make a
reasonable looking video. And of course, right away,
there's a bunch of examples on their website that are crazy. Now, before I show you these, I just need you to keep this in mind, you're about to watch a
bunch of AI generated videos and you know that you're about to watch a bunch of AI generated content. So your brain, you're already
looking for this stuff and it's not perfect, you
will find imperfections, but not everybody who
sees AI generated content on the internet knows
to be looking for that. So also keep that in mind. This is also the worst
that this technology is going to be from here on out. So, okay, here's one of the videos. There's no audio to any of these clips, but the prompt for this one is a stylish woman walks
down a Tokyo street filled with warm, glowing,
neon and animated city signage. She wears a black leather jacket, a long red dress and black boots. This video is already miles
ahead of where we were. It has accurate lighting,
it has materials, it has skin tones, movements, even has reflections all over the place. Now, of course, if you look at it for more than about 10
seconds, very closely, there are lots of giveaways. Like this dude in the background kinda looks like he's
gliding in a weird way. The frame rates and the
reflections in the water are for some reason lower
than the rest of the video. The camera movement overall
is just a bit inconsistent and it just, I don't know, it just kinda feels a little bit off. But then again, this is
where we were one year ago. So just keep that in the back
of your head for all this. Okay, how about this one? This is another one
which has a long prompt about a camera following
behind a white vintage SUV with a black roof rack as it's
speeds up a steep dirt road. This is also, again, really good. It kinda looks a little more video gamey because of how rock solid
the drone footage is, but clearly very usable. Here's another one, a litter of golden retriever
puppies playing in the snow. Their heads pop in and out
of the snow covered in it, it's so good. It feels like the physics
of the fur and the ears and everything with the snow
flying around in slow motion is incredible. I've looked through all
of the sample videos on OpenAI's website, and clearly these are
the handpicked best ones that they chose to share where they just put in some text and then get a video and don't modify it. But there's really
impressive stuff in there. Some of it has humans, some of it doesn't. Some of it is more realistic feeling like the truck driving one, but some of them are more
video gamey or more stylized. A lot of it is slow motion, I just have to say how insanely fast these models are improving is genuinely, like that's the shocking part. Like I remember not even that
many months ago, DALL-E 3, really, really high end, and you could always still
find something off about it. Like especially if you ask it for something like a
photorealistic image of a human, something about like the hands or the ears would always just be a little bit off, nevermind the physics. But even this video here
is crazy at first glance. The prompt for this AI generated video is a young man in his 20s is sitting on a piece of a
cloud in the sky reading a book. This one feels like 90%
of the way there for me. Like it's beyond the uncanny valley of like apple's personas, which are actually based on humans. This is a made up person. I mean, his eyes are kinda weird, and the motion of the pages
in the book are kinda odd. And yeah, obviously, he's in
a cloud and that's a giveaway but like, the lighting and
the shadows and the skin tones and then all the realism of
the textures on the shirt and the way the shirt and
the pants move and the hair, they're all really impressive. And then for this one, they typed in a movie trailer featuring the adventurers
of the 30-year old spaceman wearing a red wool knitted
motorcycle helmet, blue sky, salt desert, cinematic style,
shot on 35 millimeter film. And the closeups of his face,
the fabrics on the helmet, the film grain through every
shot and the cinematic style, this is one of the most
convincing AI generated videos I've ever seen, minus
maybe the weird physics of that dude walking
kind of in fast motion. So Sam Altman, if you
follow him on Twitter, he's going through a whole bunch more of like people's requests and posting a bunch more generated videos. And so if you wanna check out his profile, you can see those. But here's the thing about
these AI generated videos now, as good as they've gotten to this point, they can and will pass as real videos to people who are not looking
for AI generated videos. Now that is obviously insanely sketchy during an election year in the U.S. and also terrifying for a bunch of other
internet related reasons but it's also perfect for stock footage. Like there are already
all kinds of presentations and advertisements and then PowerPoints that are in need of oddly
specific stock videos. And these AI generated videos
are already good enough to 100% pass for that purpose. Like look at this one,
this one with the waves at Big Sur, this drone shot. Honestly, if I saw this on Twitter, I wouldn't even think twice. I'd be like, "Oh, nice drone shot, dude." Wouldn't even think about
AI if I wasn't pixel peeping at like the way the water was moving. Like this is a totally
usable video in an ad for some California based product. And that has all sorts of
implications for the drone pilot that no longer needs to be hired, for all the photographers
and videographers whose footage no longer
needs to be licensed to show up in that ad that's being made. It's already that good. There's other stuff like this wall of TVs, which would be a totally
expensive and difficult thing to shoot with a camera and
all these old expensive props, but if you can just generate
it this well with reflections and the environment and
everything else around it, I mean, why do it any other way? It's also very capable of
historical themed footage. So this is supposed to be
California during the Gold Rush. It's AI generated but
it could totally pass for the opening scene in an old Western with the right music over it. How long until an entire ad, every single shot is
completely generated with AI? Or what about an entire YouTube
video or an entire movie? I'm tempted to say like we're
a long way away from that because you know, this
still has flaws clearly and there's no sound, and there's a long way to go
with the prompt engineering to iron these things out. But then again, the spaghetti
was like a year ago. Now actually like that
OpenAI, on their website, they show some of the downfalls too of this particular model. And because who would know better than the people who have been using it? This is a very private
tool, by the way, right now. It's in super limited access, so it's in the hands of red teamers, which basically means people
testing it, pushing the limits, trying to break it, and
a few trusted creators. But they have found plenty
of weird edge stuff. Like this clip here of a
bunch of gray wolf pups looks normal at first but then it's pretty clear
that something's kinda off with the way they're just
kinda appearing out of nowhere and walking through each other. That's kinda weird. Or this clip of a guy
running on a treadmill, which I mean, I don't
really have to say much more about why this one is weird. But this is my favorite one, again, so again, just try to put
yourself in the mind of someone who's not expecting AI. You're just scrolling through Facebook or Twitter or something, right? So you just see this video. So first I just want
you to watch this clip as if it's just a stock video you found of a grandma celebrating her birthday. And just try to think like, I wonder what birthday
she's celebrating, right? I don't know, how old do you think she is? 60? 65? Maybe it's the big 70. She seems to really like that cake. Now, did you see it? Did you catch that? I'm gonna play it again, but this time, watch the video knowing that AI generated
photos and videos have trouble accurately doing hands. I'll play it again. And now it feels super obvious
like every time you watch it, watch a different set of hands,
it gets weirder and weirder. You can watch it like five
times and there's dead giveaway after dead giveaway, not even mentioning the
weird inconsistencies with the direction of
the wind on the candles. But even as I'm saying all that, even as it's coming outta my mouth, I can't help but remember
that 12 months ago we were critiquing this. (Will laughing) So what does this all mean? Well, I mean, there's what it means now and there's what it means for the future. Now, Sora, this thing that they've made is clearly a really impressive
video generation AI tool that is both going to fool
people and also be very useful. There's also a watermark
in the bottom corner of every video generated by it. So if you see one of those videos and ideally it hasn't been cropped out, then that's at least a
pretty clear indicator that it's AI generated. It's a Sora video. But also, I do think they're
gonna have to be very careful with this, they're gonna have
a whole bunch of safety stuff to keep in mind. I think they'll probably
have to be even more safe than DALL.E. Like you shouldn't be able to
generate people's likenesses. Like you shouldn't be
able to make a politician look like they're doing
something on video, especially this year. You probably won't be able to make Will Smith eating spaghetti, but it also definitely
means stock video generation is absolutely going to take a
dent out of video licensing. Like I can basically guarantee that. Like logistically, why would
anyone making something pay for footage of a house in the cliffs when they can generate one for free or for a small subscription price? Like that is the real scary
part of what this tool implies. But in the future, it gets
pretty existential, man. I mean, okay, if this
is trained on all videos that have ever been made by humans, then surely it can't be
innovative or creative in ways that humans haven't
already been, right? I don't know. Either way, I'll have all the links below for all the Sora stuff, for OpenAI stuff, and I guess I'll talk to you next year when we look back and go, "Remember that first version of Sora "and how bad those wolf pups looked "when they spawned out of nowhere?" Just remember, this is the
worst that this technology is going to be from here on out. Thanks for watching. Catch you the next one, peace. (bright upbeat music)