OpenAI shocks the world yet again… Sora first look

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

yesterday open AI Unleashed their latest monstrosity on humanity and it's truly mind-blowing I hope you enjoy a good existential crisis because what you're about to see is one small step for man and one giant leap for artificial kind we all knew that better AI video models were coming but open AI Sora just took things beyond our wildest expectations it's the first AI to make realistic videos up to a minute long in today's video we'll look at what this text of video model can actually do figure out how it works under the hood and pour one out for all the humans that became obsolete this time it is February 16th 2024 and you watching the code report when I woke up yesterday Google announced Gemini 1.5 with a context window up to 10 million tokens that was an incredible achievement that was also blowing people's minds but Sundar was quickly overshadowed by Sam ultman who just gave us a preview of his new friend Sora which comes from the Japanese word for Sky it's a textto video model and all the video clips you're seen in this video have been generated by Sora it's not the first AI video model we already have open models like stable video diffusion and private products like Pika but Sora blows everything out of the water not only are the images more realistic but they can be up to a minute long and maintain cohesion between frames they can also be rendered in different aspect ratios they can either be created from a text prompt where you describe what you want to see or from a starting image that gets brought to life now my initial thought was that open AI Cherry Picked all these examples but it appears that's not the case because Sam Alman was taking requests from the crowd on Twitter and returning examples within a few minutes like two golden retriever doing a podcast on top of a mountain not bad but this next one's really impressive a guy turning a nonprofit open source company into a profit-making closed Source company impressive very nice so now you might be wondering how you can get your hands on this thing well not so fast if a model this powerful was given to some random chud one can only imagine the horrors that it would be used for it would be nice if we could generate video for our AI influencers for additional tips but that's never going to happen it's highly unlikely this model will ever be open source and when they do release it videos will have c2p metadata which is basically a surveillance apparatus that keeps a record of where content came from and how it was modified in any case we do have some some details on how the model works it likely takes a massive amount of computing power and just a couple weeks ago Sam Altman asked the world for $7 trillion to buy a bunch of gpus yeah that's trillion with a t and even Jensen Wong made fun of that number because it should really only cost around $2 trillion to get that job done but maybe Jensen is Wong it's going to take a lot of gpus for video models to scale let's find out how they work Sora is a diffusion model like Dolly and stable diffusion where you start with some random noise then gradually update that noise to a coherent image check out this video if you want to learn more about that algorithm now there's a ton of data in a single still image like a th000 pixels by a th000 pixels by three color channels comes out to 3 million data points that's a big number but what if we have a 1 minute video at 60 frames per second now we have over 10 billion data points to generate now just to put that in perspective for the primate brain 1 million seconds is about 11 1/2 days while 10 billion seconds is about 3177 years so there's a massive difference in scale plus video has the added dimension of time to understand this data they took an approach similar to large language model models which tokenize text like code and poetry for example however Sora is not tokenizing text but rather visual patches these are like small compressed chunks of images that capture both what they are visually and how they move through time or frame by frame what's also interesting is that video models typically crop their training data and outputs to a specific time and resolution but Sora can train data on its native resolution and output variable resolutions as well that's pretty cool so how is this technology going to change the world well last year tools like Photoshop got a whole twet of AI editing tools in the future we'll be able to do the same in video like you might have a car driving down the road and want to change the background scenery now you can do that in 10 seconds instead of hiring a cameraman and CGI expert but another lucrative high-paying career that's been put on notice is Minecraft streaming Sora can simulate artificial movement in Minecraft and potentially turn any idea into a Minecraft world in seconds or maybe you want to direct your own Indie Pixar movie AI makes that possible by stealing the artwork of talented humans but it might not be that easy as impressive as these videos are you'll notice a lot of flaws if you look closely they have that subtle but distinctive AI look about them and they don't perfectly model physics or humanoid interactions but it's only a matter of time before these limitations are figured out although I'm personally threatened and terrified of Sora it's been a privilege and an honor to watch 10,000 years of human culture get devoured by robots this has been the code report thanks for watching and I will see you in the next one

Info

Channel: Fireship

Views: 1,413,114

Rating: undefined out of 5

Keywords: webdev, app development, lesson, tutorial

Id: tWP6z0hvw1M

Channel Id: undefined

Length: 4min 21sec (261 seconds)

Published: Fri Feb 16 2024