OpenAI's Sora Made Me Crazy AI Videos—Then the CTO Answered (Most of) My Questions | WSJ

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- The video captures sort of the detail of the prompt when it comes to the hair and you know, sort of like professionally-styled women. - But you can also see some issues. - Certainly, especially when it comes to the hands. - [Joanna] These two women, not real. They were created by Sora, OpenAI's text-to-video AI model. But these two women, very real. - I'm Mira Murati, CTO of OpenAI. - And former CEO. - Yes, for two days. - [Joanna] In November when OpenAI CEO, Sam Altman, was momentarily ousted, Murati stepped in. Now she's back to her previous job running all the tech at the company including... - Sora is our video generation model. It is just based on a text prompt and it creates this hyper realistic, beautiful, highly-detailed videos of one-minute length. - [Joanna] I've been blown away by the AI-generated videos, yet also concerned about their impact. So I asked OpenAI to generate some new videos for me and sat down with Murati to get some answers. How does Sora work? - It's fundamentally a diffusion model which is a type of generative model. It creates a more distilled image starting from random noise. - [Joanna] Okay, here are the basics. The AI model analyzed lots of videos and learned to identify objects and actions. When given a text prompt, it creates a scene by defining the timeline and adding detail to each frame. What makes this AI video special compared to others is how smooth and realistic it looks. - If you think about filmmaking, people have to make sure that each frame continues into the next frame with the sense of consistency between objects and people. And that's what gives you a sense of realism and a sense of presence. And if you break that between frames, then you get this disconnected sense and reality is no longer there. And so this is what Sora does really well. - You can see lots of that smoothness in the videos OpenAI generated from the prompts I provided. But you can also see flaws and glitches. A female video producer on a sidewalk in New York City holding a high-end cinema camera. Suddenly, a robot yanks the camera out of her hand. - So in this one, you can see the model doesn't follow the prompt very closely. The robot doesn't quite yank the camera out of her hand, but the person sort of morphs into the robot. Yeah, a lot of imperfections still. - One thing I noticed there too is when the cars are going by, they change colors. - Yeah, so while the model is quite good at continuity, it's not perfect. So you kind of see the yellow cab disappearing from the frame there for a while and then it comes back in a different frame. - Would there be a way after the fact to say, "Fix the taxi cabs in the back?" - Yeah, so eventually. That's what we're trying to figure out, how to use this technology as a tool that people can edit and create with. - I wanted to go through one other... What do you think the prompt was? - It looks like the bull in a China shop. Yeah, metaphorically, you'd imagine everything breaking in the scene, right? And you see in some cases that the bull is stomping on things and they're still perfect. They're not breaking. So that's to be expected this early on. And eventually, there's gonna be more steerability and control and more accuracy in reflecting the intent of what you want. - And then there was that video of, well, us. The woman on the left looks like she has maybe like 15 fingers in one of the shots. - [Mira] Hands actually have their own way of motion and it's very difficult to simulate the motion of hands. - In the clip, the mouths move but there's no sound. So is audio something you're working on with Sora? - With Sora specifically, not in this moment. But we will eventually. - [Joanna] Every time I watch a Sora clip, I wonder what videos did this AI model learn from? Did the model see any clips of Ferdinand to know what a bull in a China shop should look like? Was it a fan of SpongeBob? - Wow! You look real good with a mustache, Mr. Crab. - By the way, my prompt for this crab said nothing about a mustache. What data was used to train Sora? - We used publicly available data and licensed data. - So, videos on YouTube. - I'm actually not sure about that. - Okay. Videos from Facebook, Instagram. - You know, if they were publicly available, publicly available to use, there might be the data, but I'm not sure. I'm not confident about it. - What about Shutterstock? I know you guys have a deal with them. - I'm just not gonna go into the details of the data that was used, but it was publicly available or licensed data. - [Joanna] After the interview, Murati confirmed that the licensed data does include content from Shutterstock. Those videos are 720p, 20 seconds long. How long does it take to generate those? - It could take a few minutes depending on the complexity of the prompt. Our goal was to really focus on developing the best capability and now we will start looking into optimizing the technology so people can use it at low cost and make it easy to use. - To create these, you must be using a lot of computing power. Can you give me a sense of how much computing power to create something like that versus a ChatGPT response or a DALL-E image? - ChatGPT and DALL-E are optimized for the public to be using them, whereas Sora is really a research output. It's much, much more expensive. We don't know what it's going to look like exactly when we make it available eventually to the public, but we're trying to make it available at similar cost eventually to what we saw with DALL-E. - You said eventually. When is eventually? - I'm hoping definitely this year, but could be a few months. - There's an election in November. You think before or after that? - You know, that's certainly a consideration dealing with the issues of misinformation and harmful bias. And we will not be releasing anything that we don't feel confident on when it comes to how it might affect global elections or other issues. - Right now Sora is going through red teaming, AKA the process where people test the tool to make sure it's safe, secure, and reliable. The goal is to identify vulnerabilities, biases, and other harmful issues. What are things that just you won't be able to generate with this? - Well, we haven't made those decisions yet, but I think there will be consistency on our platform. So similarly to DALL-E where you can't generate images of public figures, I expect that we'll have a similar policy for Sora. And right now we're in discovery mode and we haven't figured out exactly where all the limitations are and how we'll navigate our way around them. - What about nudity? - I'm not sure. You can imagine that... You know, there are creative settings in which artists might want to have more control over that. And right now, we are working with artists and creators from different fields to figure out exactly what's useful, what level of flexibility should the tool provide. - How do you make sure that people who are testing these products aren't being inundated with illicit or harmful content? - That's certainly difficult. And in the very early stages, it is part of red teaming. Something that you have to take into account and make sure that people are willing and able to do it. When we work with contractors, we go much further into that process, but that is certainly something difficult. - We're laughing at some of these videos right now. But people in the video industry may not be laughing in a few years when this type of technology is impacting their jobs. - You know, the way that I see it is this is a tool for extending creativity and we want people in the film industry, creators everywhere, to be a part of informing how we develop it further and also how we deploy it. And also, you know, what are the economics around using these models when people are contributing data and such. - One thing was clear from all this. This tech is going to quickly get faster, better, and become widely available. How are we going to tell the difference between what is real video and what is AI video? - We're doing research and watermarking the videos, but really figuring out content provenance and how do you trust what is real content versus something that happened in reality versus content created for misinformation. And this is the reason why we're actually not deploying the systems yet because we need to figure out these issues before we can confidently deploy them broadly. - [Joanna] That was reassuring to hear. But there are still big concerns about Silicon Valley's race to create AI tools and its ambition for power and money versus our safety. - It's not really a difficult demand or a difficult balance between profit and safety guardrails. I'd say the hard part is really figuring out the safety questions and the societal questions. That's really what keeps me up at night. - There's this amazement about the product, but then we've also talked about all of these concerns. Is it worth it? - It's definitely worth it. AI tools will extend our creativity and knowledge, collective imagination, ability to do anything. It's going to be extremely hard along the way to figure out the right path to bring AI tools into our day-to-day reality. But I think it's definitely worth trying.
Info
Channel: The Wall Street Journal
Views: 496,089
Rating: undefined out of 5
Keywords: openai, sora, openai news, openai cto mira murati, openai cto, open ai, cto, openai sora, text to video ai, text to video, large language model, generative ai, wsj, wsj interview, mira murati, mira murati interview, joanna stern, sam altman, video generation ai, ai model, ai, chat gpt, artificial intelligence, mira murati cto openai, openai ceo, chatgpt, ai video concerns, misinformation, tech, tech news, tech things with joanna stern, when will sora come out, ai video, techy
Id: mAUpxN-EIgU
Channel Id: undefined
Length: 10min 38sec (638 seconds)
Published: Wed Mar 13 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.