OpenAI's STUNS with "OMNI" Launch - FULL Breakdown

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right so open AI made their big Monday announcement just a few minutes ago I watched it live and now let me tell you about it essentially they released her you remember that movie her with Walkin Phoenix well you seem like a person but you're just a voice in a computer I can understand how the limited perspective of an unartificial mind would perceive it that way it was a movie about artificial intelligence that was so smart and so good and really personal and the main character without giving too much way built a really personal relationship with AI and that's really what it felt like today and so let's go over all the announcements together let's watch it together and I'm going to comment on it so let's get into it today I'm going to talk about three things that's it we will start with why it's so important to us to have a product that we can make freely available and broadly available to everyone all right so you can already see what they're going to be talking about here they're going to be talking about the Mission I'm going to skip over that it is basically making artificial general intelligence and its value broadly applicable now whether you agree with their approach or not that is their mission and next they launched a desktop app and a web UI update so let's SK over to that but the main part of the presentation is GPT 4 o and that's the letter O not the number o and I'm going to tell you all about that as we watch this video together all right so let's take a look at all the changes that the UI has has gone through and let's look at the desktop app and today we're also bringing the desktop app to chat gbt because we want you to be able to use it wherever you are as you can see it's easy it's simple it integrates very very easily in your workflow along with it we have also refreshed the UI we know that this models get more and more complex but we want the experience of interaction to actually become more natural so they said they refreshed the UI but to be honest it looks exactly the same to me and I'm usually pretty good about noticing small changes I don't really even see what's different about it just looking at it easy and for you not to focus on the UI at all but just focus on the collaboration which had GPT and now the big news today we are releasing our newest flagship model this is GPT 40 all right so she announces that they are announcing GPT 40 their new Flag ship model so it's not GPT 5 we did not get GPT 5 today there were a lot of rumors that that was going to happen today but they still are cooking and so we have GPT 40 which is an iteration on GPT 4 and it's really unique in some ways but what they've done with it and they described it as magical it's pretty darn cool and really gets us to a big step forward towards that vision of that future of her G4 provides gp4 level intelligence but it is much faster and it improves on its capabilities across text vision and audio for the past couple of years we've been very focused on improving the intelligence of this models so I'm going to pause it there for a second so the O in GPT 40 is Omni model model and that is what they're calling it now it's basically text vision and voice all in one but I'll let Mera talk about it this is the first time that we are really making a huge step forward when it comes to the EAS of use and this is incredibly important because we're looking at the future of interaction between ourselves and the machines and we think that GPT 40 is really shifting that paradigm into the future of collaboration where this interaction becomes much more natural and Far Far easier yeah so I'm really glad they're actually making this push and I suspect there's going to be a lot of open- source projects in the next few weeks coming out that do exactly this but the gist is previously interacting with models was almost entirely text based which is fine but it was very much I type out a prompt and then I wait I get the response back and then I type out another one I wait and get the response back sometimes you could even do voice transcription and then you submit it and then you wait and get the response back and it felt very turn-based and very unnatural to be honest and that was the big thing that they were pushing for and the big thing that they announced today was making it feel much more natural and I really like this and this is on the heels of the rumored open AI Apple deal closing I think just yesterday where Siri is now going to be powered at least in part by Chachi BT and now that is making a lot more sense because of all of the things that they're about to show us making this happen is actually quite complex because when we interact with one another there's a lot of stuff that we take for granted you know the ease of our dialogue when we interrupt one another the background noises the multiple voices in a conversation or you know understanding the tone or voice all of these things are actually quite complex for for these models and until now with voice mode we had three models that come together to deliver this experience we have transcription intelligence and then text to speech all comes together in orchestration to deliver voice mode so that's really interesting she basically said that to deliver this experience previously they had three separate models text to speech spe so being able to actually have The Voice come out of chbt the intelligence model which is kind of the just the core model and voice transcription which took your voice and transcribed it into text to the model this also brings a lot of latency to the experience and it really breaks that immersion in the collaboration which had GPT but now with gbd4 this all happens natively gbd 40 reasons across voice text and vision and with this incredible efficiencies it also allows us to bring the GPT 4 class intelligence to our free users this is something that all right so this is big news GPT 4 intelligence this model to the free users Sam Alman was just on the all-in podcast where he talks about and or he talked about wanting to bring all of the capabilities of GPT 4 to to their free users and literally just a week ago he was saying they were figuring it out or they couldn't figure it out but it looks like he actually had it figured out and he just didn't want to announce it yet which is understandable so I'm getting the sense that if Sam Alman says something like hey we're working on this we haven't figured it out yet it's actually probably imminent all right so here are some important stats gbt 40 is two times faster and within the API it's 50% cheaper five times higher rate limits compared to gbt 4 Turbo and the five times higher rate limits is I believe only for paid users because she mentions although I'm not going to play it she mentions that with paid users you get five times more capacity than free users so there's still a big benefit to being a paid user and of course that assumes you believe in their approach to closed Source models all right here's where it gets really interesting they are going to show off some demos and it's subtle but incredibly impressive and a big leap in just the way you interact with AI it does feel so much more natural it does feel like her and uh let's watch it together it's pretty incredible by the way these are two of the leading Engineers who worked on this project hi I'm uh I'm Barrett hey I'm Mark so one of the key capabilities we're really excited to share with you today is realtime conversational speech let's just get a demo fired up so I'm taking out a phone if you are wondering about this wire it's so we have consistent internet and if you see there's this little icon on the bottom right of the track GPT app and this will open up GPT 40's audio capabilities hey chat GPT I'm Mark how are you oh Mark I'm doing great thanks for asking how about you hey so I'm on stage right now I'm doing a live demo and frankly I'm feeling a little bit nervous all right I'm going to pause it right there immediately when you hear chat GPT respond it is in near real time and you can tell that she has a lot of emotion in her voice it is not simply just reading text as you're used to with typical TTS models there is actually a lot of personality in The Voice already and you're going to see a lot more of that as we go through the rest of these demos and quickly before we do that open AI released this blog post just a few days ago introducing the model spec where they detail how they believe AI should interact with humans and it's actually pretty good A lot of it I agree with and I'm going to make a separate video all about this blog post because I think it's really telling as to what we can expect from the future of AI not just from open AI but I think this is going to be a template for many AI companies to use all right let's continue so I believe his name is Mike here is saying that he's doing a presentation and he's very nervous can you help me calm my nerves a little bit oh you're doing a live demo right now that's awesome just take a deep breath and remember you're the expert I like that suggestion let me try a couple deep breaths can you give me feedback on my breaths okay here all right I want you to notice something that was really subtle what just happened but I want you to take a look he interrupted chat GPT currently it's not really possible to continue a conversation and interrupt like that usually you have to stop it and then restate it but he simply just interrupted and she stopped talking and I'm calling it a she and and that's weird I'm not sure what to do with this yet I'll figure that part out but let me show you that again just in case you missed it and remember you're the expert I like that suggestion let me try a couple deep breaths can you give me feedback on my breaths okay here I go whoa slow a bit there mark you're not a vacuum cleaner breathe in or account of four okay all right so so a couple things one a lot of personality in the voice of chat GPT and I bet you're going to be able to dial in that personality to exactly how you wanted to behave with you and again I'm reminded of exactly her the movie Her by the way if you haven't seen that movie it is a fantastic movie and almost a road map into what the future of AI will look like and then one other thing I want to point out is there seems to be some audio issue with the output and you're going to hear a couple parts where it gets glitchy or a little bit laggy sounding or just missing audio and they're not reacting at all to it which makes me think that it's only the output of what we're hearing on the live stream uh let me try again so I'm going to breathe in deeply and then breathe out for four and then exhale slowly okay I'll try again breathing in and breathe out that's it how do you feel I feel a lot better thank you so much so Mark you've been working on these capabilities for a while now can you tell us a bit how it's different from voice mode right so if you've used our voice mode experience before you'll notice a couple key differences first you know you can now interrupt the model you don't have to wait for it to finish your turn before you can start speaking and you know you can just butt in whenever you want so that's huge that alone makes it feel so much more natural when you're in conversation you don't always wait for the other person to finish their thought or their sentence you know patiently and then give it a few seconds and then respond you kind of jump in right at the tail end sometimes sometimes you interrupt altogether and so now it it does feel so much more natural to have this assistant this AI assistant where you can just butt in and you can ask it questions or you can interrupt and say no no that's not what I meant so that's a really cool feature and I'll also mention the entire demo seems to be much more focused on voice mode and so I think they're going really heavy into voice mode because text is great but voice is the most natural way to interact with other humans and now artificial intelligence and that also reminds me again of their deal with Siri imagine this as Siri except with the Siri example imagine if Siri can control your phone and have access to everything your email your calendar all the files that you've ever downloaded that's a really compelling version of series something that obviously we haven't seen to date now I'm still very disappointed that apple is not either building their own model or using an open- Source model but more on that in another video model is real time responsiveness so that means that you don't have this awkward two to 3 second lag before you wait for the model to give a response and finally the model picks up on emotion right when I was breathing super hard there it could tell and it knew hey you might want to calm down a little bit you're kind of running a little bit too fast so there was a demo of a company a few weeks ago I'm actually forgetting the name if I remember it I'll drop it in the description below but basically it was exactly this it was AI that could read your emotion from your voice and reply back with emotion in its voice and so open AI now has that which is really cool I hope that other company goes open source or at least releases their technology so that open source providers can use this and again the way that open source has been working with AI over the last year I wouldn't be surprised if we have this kind of functionality this kind of emotional intelligence in our artificial intelligence open source models within the next few weeks so here's hoping to that let's get that Mark Zuckerberg please you know it really has that capability across the board to perceive your emotion not only that though the model is able to generate voice in a variety of different emotive Styles and it really has a wide dynamic range so I'm going to fire up another demo for that hey chubbt how are you doing I'm doing fantastic thanks for asking how about you pretty good what's up so my friend um Barrett here he's been having trouble sleeping lately and uh I want you to tell him a bedtime story about robots and love oh a bedtime story about robots and love I got you covered gather round Barrett once upon a time in a world not too different from ours there was a robot named bite bite was a curious robot always expl started this story but I want a little bit more emotion in your voice a little bit more drama okay so he interrupted and asked chat GPT to change its emotion in its voice and again I think the interrupting feature seems small but I think it's going to be really powerful in making the interactions feel so much more natural got it let's add some drama once upon a time in a world not too different from ours there was a robot named no nobt I really want maximal emotion like maximal expressiveness much more than you were doing before understood let's amplify the drama Once Upon a Time in a world not too different from ours there was a robot named bite bite can you do this in a robotic voice now initiating dramatic robotic voice once upon a time in a world not too different from ours there was a robot named bite bite was a curious robot so imagine this I mean I know I keep mentioning her but this is exactly that you are essentially saying to this a I here is how I want you to behave here's the emotion in your voice that I want here's how I want you to tell me things and I think that's going to be awesome I think this is a huge leap in interactivity with artificial intelligence that we're seeing here and again I'm very hopeful that open- Source models are going to start incorporating this very soon and I think that's the pattern that we're going to see going forward where these Frontier models these Frontier companies release something everybody sees what's possible and then opens Source releases it a few weeks few months later bot always exploring new circuit Barrett here calm down a little bit can you end the story um but do it in a singing voice and so bite found another Ro but friend and may live circly Ever After thank you so much I definitely feel a lot more relaxed you've just seen the voice capabilities but we also want to show you the vision capabilities as the model can also see the whole world around us all right they're about to show us some Vision capabilities which we've seen this stuff already but I think combined with the interactivity with Chachi PT now I'm describing as her it's it's quite impressive although really the most impressive Parts we've already seen in the last demo but let's watch this hello there how's it going it's going really well today I'd really like your help solving a math problem I'm all ears what math problem can I help you tackle today so I'm going to write down a linear equation on a sheet of paper and I'll show you and then I love your help working me through it but importantly don't tell me the solution just help give me hints along the way got it oh okay I see it no I didn't show you yet just give me help along the way one second whoops I got too excited I that alone was really cool so there was an obvious mistake the phone was flat on the table and Chachi PT says oh I see it but she can't see it cuz the camera wasn't pointed at anything and he corrected he said oh oh actually I haven't shown it to you yet let me write it out and then I'll show it to you and she says oh I got too excited the personality there is just very impressive to me and now we're going to see Vision mode so Chachi PT what equation did I write down there ah I see it now now you wrote down 3x + 1 = 4 yep exactly so what's the first step I should take to try to so again he interrupted her and it's kind of weird and inspiring and exciting to hear ai go ah it's such a human thing to do and now ai will be doing that and I think that is the big breakthrough that open AI had here was making AI sound more human and interact in a more human way the first step is to get all the terms with X on one side and the constants on the other side so what do you think we should do with that plus one okay I'm going to try to subtract one from both sides and then I'll see what I get all right so I'm going to skip over this a little bit he basically goes back and forth with chap GPT and without chat GPT actually telling him exactly what to do she kind of guides him along the way it's pretty cool and if you haven't seen the full video I did a super cut of it which I included this demo in all right now we're about to see something which I think is really cool it kind kind of shows off more of what we've seen But the reaction from chat PT is just super impressive let's watch anything else you'd like to tackle today so chbt I really love that you you know taught the value of math to my friend Mark and I wrote one last thing I love if you could take a look at of course I'd love to see what you wrote show it to me whenever you're ready okay so this is what I wrote down what do you see a I see I love chat CHT that's so sweet of you yeah well I really appreciate all the help all right so again the a like that noise it's just so human but not AI so they programmed it to really behave as though it is a human and I think that is the big picture here they have injected a tremendous amount of emotion and emotional intelligence into Chad GPT and that's what we're seeing so they basically took all of the input methods combined them into one and made it real time so it's much much faster and that is a key unlock to making it feel much more human and much more of a natural interaction with AI all right in this next demo what we're going to see is the desktop version of chat GPT reading from the screen and being able to tell you different things about what it's seeing and we've seen a lot of this before basically they're going to copy the text from this code and chat gbt is going to describe what it does so let's watch okay and to give a bit of background of what's going on so here we have um a computer and on the screen we have some code and then the chat gbt voice app is on the right so chat gbt will be able to hear me but it can't see anything on the screen so I'm going to highlight the code command see it and then that will send it to chat GPT hey chat GPT hey there how's it going yeah it's going really well I was wondering if you could help me with a coding problem today of course I'd love to help you out what's the coding problem you're dealing with okay I'm going to share with you some code one second sure thing take your time copies the code pasted it in or it automatically got it from the copy or from the clipboard I should say and then now she's reading back or just explaining what that code does okay I'm going to skip ahead a bit okay so they generated a graph from that code and now they're going to ask chbt questions about it this is all stuff we've seen before through code interpreter and it's still impressive to see but it's the same stuff except with the addition of being able to actually have a natural conversation around it all right in this next example somebody asked hey does it do live translation really well and so what they're going to show off is translation between Mira moradi the CTO on the left speaking Italian and Mike on the right speaking English and it translating it back and forth but what I think is really cool is when he asked Chad GPT to do it she responds in a kind of quirky cool way so let's take a look I have a friend here who only speaks Italian and I only speak English and uh every time you hear English I want you to translate it to Italian and if you hear Italian I want you to translate it back to English is that good perfeto okay so I think it's cool that Chachi PT says Perfecto and I don't know it's a cool Quirk it kind of shows off a personality and I don't know I think this is really the future of what a personal AI assistant is going to look like so let me just briefly show you the translation back and forth Mike she wonders if whales could talk what would they tell us um they might ask how do we solve linear equations certainly yes all right so cool little back and forth with translation now another viewer is asking can chat GPT know what your emotions are just by looking at your face so let's take a look and something interesting happens during this part of the demo where you will you'll see and I'll explain it after okay yeah so I'm going to show you um a selfie of what I look like and then I'd like you to try to see what emotions I'm feeling based on how I'm looking sounds like a fun challenge go ahead and show me that selfie and I'll put my emotional detective hat on okay so here's me so what kind of emotions do you think I'm feeling H it seems like I'm looking at a picture of a wooden surface oh you know what that was the thing I sent you before don't worry I'm not actually a table um okay so so take a take another look ah that makes more sense a there we go it looks like you're feeling pretty happy and cheerful all right so what just happened there so right when this person took out the phone and started recording the first thing is it was showing the back of the camera which means it was looking down at the table so for half second it was showing the table and I'll show you that so right here you can see right when they turned on Vision mode it showed the table before he flipped the camera around and it was showing his face right there so immediately the first thing chat GPT saw was the table and she says H I think I'm looking at a table and then he says oh no wait I'm not a table forget that now look and then she says oh yeah okay that makes more sense and again that whole very natural interaction is just so impressive to me all right and the last thing I want to show you is miror Mora's hint as to what's coming next so one thing is where Sam Alman he is not in this presentation whatsoever and I think that's telling maybe it's not the biggest thing they have cooking right now and Sam is waiting for that to get out on stage but nonetheless Mir moradi gives a hint as to what's coming next with the quote unquote next big thing let's watch so soon we'll be updating you on our progress towards the next big thing and before we wrap up I just want to thank the incredible open AI Team all right so that was it uh I thought this was really cool it wasn't some huge splashy announcement where they announced a brand new model or some kind of new tech but it was very subtle but I think very important nonetheless and the ability for more people to use AI in a very natural way is extremely important now question and answering is important but the real value of AI I've been really thinking about this a lot lately the real value is going to be when your personal assistant can actually accomplish tasks on your behalf just being able to ask questions and get answers about things is great but it's not that perfect use case and this is especially true since I've been testing the meta AI sunglasses it turns out I don't really have a lot of questions throughout the day that I need answered maybe a handful but what I do want is the ability to give my agents my personal assistance tasks that they go out and accomplish for me while I'm doing something else that is my dream scenario that is my perfect use case and I think think we're pretty close to having that happen but we'll see so if you enjoyed this video please consider giving a like And subscribe and I'll see you in the next one

Info

Channel: Matthew Berman

Views: 114,488

Rating: undefined out of 5

Keywords: ai, openai, omni, gpt4, chatgpt, gpt4o, gpt4 omni, her, ai assitant, llm, large language model

Id: 2cmZVvebfYo

Channel Id: undefined

Length: 27min 7sec (1627 seconds)

Published: Tue May 14 2024