OpenAI Shocks the AI Video World - Sora Changes Everything

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

open AI just changed everything, again all these videos you're seeing right now are fully AI generated I don't usually try to push out News videos like this but this is the most excited I've been about an update from open AI or any AI tool ever and that's saying a lot this is better than any other AI video platform and it's not even close it's called sora it was just released today and it's not just text to video a few hours after they released those demos they also released a research page that showed image to video video to video and actually my favorite part is this feature of connecting videos like this drone flying through these ruins then changing into a butterfly flying underwater for some reason but it looks amazing they're supposedly allowing select creators to test it right now to get feedback and they said to give the public a sense of what AI capabilities are on the horizon I mean the possibilities with this are pretty intense honestly I'll get into some of my overall thoughts and potential consequences and opportunities later but let's jump into some more examples first this movie trailer is from a single prompt and it maintains the scene and character consistency across multiple shots it has this Dynamic camera movement and handheld shots plus just how realistic the people are and then there's a couple drone shots here that are just indistinguishable from real drone shots same with this longer one of an SUV driving on a dirt road I mean this scenery just looks so good where this cat walking through a garden is so true to the way a cat actually moves with that like light footed stealthiness plus the physics of the fur and whiskers bouncing and this is a 27 second shot it can generate up to 1 minute as opposed to the 4sec morphing kind of slow motion Clips we see from everywhere else but now let's look at some more complex scenes this one in particular really blew me away first off just all these buildings being viewed coherently from all angles as it passes but at the same time there's all these Reflections that interact with them properly with the lighting and refraction especially if you go frame by frame right here as it crosses this building and see the scene it's actually reflecting honestly other video generators would struggle just generating that scene on its own with multiple faces and hands let alone as of reflection interacting with an even more complex scene I'll jump through a couple quick ones here all these people celebrating Chinese New Year is an unbelievable amount of people to be tracking and they're all pretty coherent and this flythrough of a historical town has all sorts of like horses and people walking and carrying things all while viewing these buildings from different perspectives or the ability to generate so many completely unrelated videos across all these TVs at once I that's just crazy to me then most of the wildlife shots look essentially perfect I really like this crab interacting with octopus a lot of this is photorealistic but it can do other styles too so we have some 3D animation shots that are amazing then this papercraft one is awesome and this dancing one is a great spot for comparison because just the other day I was generating videos in Runway it was of a cult of barn owls performing Secret disco rituals don't ask but my point is just look at the difference of these slow motion dance moves that start to like morph even in just a couple seconds clip and some of these just have no faces the comparison isn't even remotely close and then I want to focus in on this one for a second so it has two just incredibly detailed pirate ships that are viewed from all angles but simultaneously having multiple difficult physics simulations of like the flags waving and then the liquid then it also knows to apply tilt shift to make it seem like it's miniature and that just highlights how much is going on in this and how timec consuming it would be to do this in any other way now those are all amazing but let's look closer at the shots that involve people that's where things get really difficult so first it can do a close-up of an eye incredibly well so even captures the little micro starts and stops your eye does when you move them from left to right they're called Cades but let's zoom out some more here's a gray-haired man pondering the universe this is amazing even the just subtleties in his wrinkles as his face makes like minor expressions and like the different speeds he blinks and then there's really accurate urate refraction in its glasses but let's zoom out even more so this one's actually the generation Greg Brockman used in his announcement and it does an incredible job with Reflections and this realistic walk cycle and then proper physics on the purse and clothing and her earrings that's all with a pretty complex background too this is miles apart from anything we've seen before but there are some issues with this like you're not going to be convinced this is a real video if you're paying attention at least you like the feet kind of sliding across the ground or if you look at any of the people in the background they're good but some are a little weird so there's certainly limitations so here's another one that has a lot of people in it while the camera is spinning which is really pushing the limits if you watch this bottom left corner there's a kind of trippy perspective shift here and then some weirdness in the hands and movements of the people overall they look great and can't stress enough how far beyond this is from anything else we've seen but it is still of course not perfect especially with people but that's just when I'm analyzing them to find something weird if I was just scrolling social media I would think almost every one of these is real and as the saying goes this is the worst it will ever be and all of those examples are at open.com Sora but they also released more examples of all the other stuff it can do Under the research section of their site I'll link to that as well before I jump into that I assume everyone watching this video uses chat GPT to some extent but one thing that can be a bit of a struggle is knowing how to implement it in your daily life especially at work that's why I've partnered with HubSpot to share a free resource bundle to help it will be linked in the description it covers topics across all different Industries with tips and guides on how to use chat in those Industries with specific examples and applications I mean the number one problem I hear from people regarding AI is they see these videos about chat GPT or other AI tools and say that's really cool but how can I actually use it to help me you know to save time or solve a problem one example section here is called 100 ways to try chat GPT today with 100 sample prompts you can use or modify for whatever career you have and there will actually be five PDFs that section's in the one called supercharge your workday with chat PT but go through all the other ones in there too again they are completely free just use the link in the description to go download those I've been happy to partner with HubSpot to bring these free resources to people who watch this channel onto the Sora research so I'll link to that as well they call it video generation models as World simulators which that title starts to feel a little weird they say it's a promising path towards building general purpose simulations of the physical world and this goes to some extent on how it was trained and how it works in these first sections but let's look at some more examples it does demonstrate a lot more text to video capabilities with some really solid generations involving people but since we've seen a lot of that let's jump into the image to video just like with text to video their image to video is far beyond any other video generators so these are examples of animating Dolly images and they are all amazing I this one with the Surfers being the most complex and just like with the texted video it does a really good job with the physics like even in a really difficult example like this it can also extend videos forward and backward in time this is an example Le where all three videos were extended backwards so they all lead to the same ending this video to video is really impressive so it starts with the example of change the setting to be in a lush jungle and does an amazing job but you can click on this and it will be a drop down to select from with all these different options so we've got make it underwater and how about change it to Winter and we can add dinosaurs switch it to claymation we've got a bunch of others here so this is just next level when you compare it to something like Gen 2 or kyber kyber can do a bunch of stuff that is pretty different from this which I really like we'll have to see once we can actually play with this but either way this is mind-blowing then this next section is probably my favorite part so it can connect videos where it interpolates between the two input videos to create seamless transitions between them even with entirely different subjects and scene compositions so the center video is the one that combines them and for each of these it finds just a really creative and seamless way way to merge them like this drone shot where it puts this snow diarama thing on the back side of the building then we've got this slow morphing between the chameleon and this cool bird it looks just awesome while it's transitioning and I'll just let these other two Play Got This SUV turning into I think it's a cheetah and that historical Gold Rush Town transforming into this underwater scene both just super cool and it is also a image generator now I wonder if this will be replacing doly at some point I assume it will because honestly just a screen grab from a lot of these videos looks better than if you ran that prompt through Dolly right this is what I got when I copied the prompt of the woman in Autumn I mean this is just way better than Dolly and if we move along it talks about some of the emerging capabilities of training at scale similar to how it is with llms there's just unexpected capabilities that emerge when you train a model like this they's say some of that has led to its ability to generate Dynamic camera motion that shifts and rotates while people move consistently throughout 3D space and the object permanence as these people walk past this dog like it'll be completely blocked from the frame at some points but it still persists and will be there after they walk past interacting with the world is another capability that no other model has been able to really do so it can still only affect the state of the world in simple ways but these examples of painting and actually leaving the brush Strokes on the canvas or leaving bite marks on a burger this is amazing this is a huge step forward they also do show some limitations like glass shattering they did show some more limitations on the main site where the physics was off like this person running backwards on a treadmill these wolf pups just kind of appearing out of nowhere some weird interactions with his chair and a few others again not perfect and it also has just this mention of simulating digital worlds where it can control a player in Minecraft while also rendering the world and its Dynamics in High Fidelity the ways this will be applied in video games in the future is going to change that whole world too honestly I'm still processing ing a lot of this it is such a Monumental breakthrough I wasn't expecting to see text a video like this until maybe the end of the year at the earliest they're working on putting some guard rails around this which they should but I mean there's only so much you can do you know people have complaints about the censorship within chat GPT which I get but when it comes to videos which is generally done with the intent to share it it just gets a lot more complicated we already have fake videos all over the place some of relatively harmless things like fake Landscapes or fake weather phenomenon then there is the more sinister side of fake videos spreading around of War victims or political figures up until this point they're usually pretty obvious to anyone that's close to this stuff but maybe not to everyone else sometimes they seem to trick millions of people but as these tools keep getting more sophisticated that will just accelerate and also become impossible to detect as far as videographers and filmmakers and YouTubers there will be a lot of possibilities that open up I think creating just amazing b-roll really easily is going to be awesome I personally avoid stock footage almost completely because it just feels generic but if I can generate something that I kind of imagined for each occasion that's just a lot more interesting to me and engaging for the audience overall this completely opens the door for anyone to create visual stories in the same way one person can write a book one person could be able to create an entire movie this just provides more tools to create with and new types of content will emerge there will be bumps along the way with AI content farms and deep fakes misinformation I'm not worried at all about being replaced I think that no matter how good any of these tools get people will always be more interested on seeing things that do themselves there's a lot of uses for this to help people in the process but not to replace people entirely also I think as more generated content comes out there will just be a pendulum swing in the other direction for people craving more authentic human content people are consuming more content than ever so really there will be demand for both there's a lot of random thoughts at the end there this is a much more offthe cuff video than usual this update was amazing and if you want to stay up to date along the way to everything that happens in AI make sure to subscribe here and check out futurepedia .io we curate all the latest and greatest AI tools you can find the best tool for any use case use the chatbot to help guide the process you can save your favorites then you'll be given personalized tool recommendations every week there's been a ton of updates there so go check it out if you haven't for a while thank you so much for watching and I'll see you in the next one

Info

Channel: Futurepedia

Views: 176,601

Rating: undefined out of 5

Keywords: sora, openai, text to video, image to video, ai video, cinematic ai video, video generator, dalle, dalle 3, text-to-video, text to image, ai video generator, ai content creation

Id: ZFwzaAU8xFA

Channel Id: undefined

Length: 12min 18sec (738 seconds)

Published: Fri Feb 16 2024