GPT-4o - Full Breakdown + Bonus Details

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it's smarter in most ways cheaper faster better at coding multimodal in and out and perfectly timed to steal the spotlight from Google it's gp4 Omni I've gone through all the benchmarks and the release videos to give you the highlights my first reaction was it's more flirtatious sigh than AGI but a notable step forward nonetheless first things first GPT 40 meaning Omni which is all or everywhere referencing the different modalities it's got is Free by making GPT 43 they are either crazy committed to scaling up from 100 million users to hundreds of millions of users or they have an even smarter model coming soon and they did hint at that of course it could be both but it does have to be something just giving paid users five times more in terms of message limits doesn't seem enough to me next open AI branded this as GPT 4 level intelligence although in a way I think they slightly underplayed it so before we get to the video demos some of which you may have already seen let me get to some more under the radar announcements take text image and look at the accuracy of the text generated from this prompt now I know it's not perfect there aren't two question marks on the now there's others that you can spot like the I being capitalized but overall I've never seen text generated with that much accuracy and it wasn't even in the demo or take this other example where two openai researchers submitted their photos then they asked GPT 40 to design a movie poster and they gave the requirements in text now when you see the first output you're going to say well that isn't that good but then they asked GPT 40 something fascinating it seemed to be almost reverse psychology because they said here is the same poster but cleaned up the text is crisper and the colors Bolder and more dramatic the whole image is now improved this is the input don't forget the final result in terms of the accuracy of the photos and of the text was really quite impressive I can imagine millions of children and adults playing about with this functionality of course they can't do so immediately because open AI said this would be released in the next few weeks as another bonus here is a video that open AI didn't put on their YouTube channel it mimics a demo that Google made years ago but never followed up with the openai employee asked GPT 40 to call customer service and ask for something I've skipped ahead and the customer service in this case is another AI but here is the conclusion could you provide Joe's email address for me sure it's Joe example.com awesome all right I've just sent the email can you check if Joe received it we'll check right now please hold sure thing Hey Joe could you please check your email to see if the shipping label and return instructions have arrived fingers crossed yes I got the instructions perfect Joe has received the email they call it a proof of concept but it is a hint toward the agents that are coming here are five more quick things that didn't make it to the demo how about a replacement for lensa submit your photo and get a caricature of yourself or what about text to new font you just ask for a new style of font and it will generate one or what about meeting transcription the meeting in this case had four speakers and it was transcribed or video summaries remember this model is multimodal in and out now it doesn't have video out but I'll get to that in a moment here though was a demonstration of a 45-minute video submitted to GPC 40 and a summary of that video we also got character consistency across both woman and dog almost like an entire cartoon strep if those were the quick bonuses what about the actual intelligence and performance of the model before I get to official benchmarks here is a human grade leaderboard pitting one model against another and yes I'm also a good gpt2 chatbot is indeed GPT 40 so it turns out I've actually been testing the model for days overall you can see the preference for GPT 40 compared to all other models in coding specifically the difference is quite Stark I would say even here though we're not looking at an entirely new tier of intelligence remember that a 100 ELO Gap is is a win rate of around 2/3 so 1/3 of the time gp4 turbos outputs would be preferred that's about the same gap between GPT 4 Turbo and last year's GPT 4 a huge step forward but not completely night and day I think one underrated announcement was the desktop app a live coding co-pilot okay so I'm going to open the chbt desktop app like miror was talking about before okay and to give a bit of background of what's going on so here we have um a computer and on the screen we have some code and then the chat gbt voice app is on the right so chat gbt will be able to hear me but it can't see anything on the screen so I'm going to highlight the code command C it and then that will send it to chat GPT and then I'm going to talk about the code to chat GPT okay so I just shared some code with you could you give me a really brief one- sentence description of what's going on in the code this code fetches daily weather data for a specific location and time period Smooths the temperature data using a rolling average and Ates a significant weather event on the resulting plot and then displays the plot with the average minimum and maximum temperatures over the year I've delayed long enough here are the benchmarks I was most impressed with Gypsy 40's performance on the math benchmark even though it fails pretty much all of my math prompts that is still a stark improvement from the original GPT 4 on the Google proof graduate test it beats Claude 3 Opus and remember that was the headline Benchmark for anthropic in fact speaking of anthropic they are somewhat challenged by this release GPT 40 costs $5 per 1 million tokens input and $15 per 1 million tokens output as a quick aside it also has 128k token context and an October knowledge cut off but remember the pricing 5 and 15 Claude 3 Opus is 1575 and remember for Claude 3 Opus on the web you have to sign up with a subscription but GPT 40 will be free so for claw Opus to be beaten in its headline Benchmark is a concern for them in fact I think the results are clear enough to say that gp40 is the new smartest AI however just before you get carried away and type on Twitter the AGI is here there are some more mixed benchmarks take the drop Benchmark I dug into this Benchmark and it's about adversarial reading comprehension questions they're designed to really test the reasoning capabilities of models if you give models difficult passages and they've got to sort through references do some counting and other operations how do they Fair the drop by the way is discrete reasoning over the content of paragraphs it does slightly better than the original GPT 4 but slightly worse than llama 3400b and as they note llama 3400b is still training so it's just about the new smartist model by a hairs breath however we're not done yet it's better at translation than Gemini models quick caveat there Gemini 2 might be announced tomorrow and that could regain the lead then there are the vision understanding evaluations it was a real step forward on the mm muu as you can see a clear 10 points better than Claude Opus again I'm curious if Google Gemini can exceed it though the improvements to the tokenizer could be revolutionary for non-english speakers the dramatically fewer tokens needed for languages like Gujarati Hindi Arabic and more don't just mean that conversations are cheaper they're also quicker and what about multilingual performance well this time they didn't compare it to other models but compared it to the original GPT 4 definitely a step up across languages but English is still by far the most suited language indeed here is a video of some of the models mistakes ending with some dodgy language tuition I know enough Mandarin to say it wasn't perfect at churing let's root root root for the home what was that sorry guys I got carried away right for round Francisco I have feeling I'm very wrong hello uh my name is NCH I'm here with my coworker hi I'm sh I'm trying to teach my coworker how to speaking manding we want to start from simple words like niow can you teach him how to pronounce that of course hey Nao nice to meet you KNE how is pretty straightforward to pronounce it sounds like KN how just make sure to keep the KNE part high and then go down and pitch on the how give it a try me how that was great really good first try not you you're natural it really wasn't he needs to work on his tones and her face was the giveaway there were a lot of other interesting video demonstrations but before them the GPT 40 blog post from Sam opman put out tonight he made the argument that putting this capable AI tool in the hands of everyone for free is justifying that open part of open AI less about open weights or open source more about open to to everyone without ads or anything like that he also draws our attention to the latency of the models in many ways that is the key Innovation with GPT 40 by dialing down the latency you dial up the realism as he said it feels like AI from the movies getting to human level response times and expressiveness turns out to be a big change indeed I think I should get a little credit for predicting her like AI in a video from a month ago but now I want to get to those demos the response times were amazing and the model was clearly designed to be flirtatious note that in Senate testimony last year samman said we try to design systems that do not maximize for engagement so let me know in the comments whether you think these demos are designed to maximize engagement okay so this is what I wrote down what do you see a I see I love chat chpt that's so sweet of you yeah well I really appreciate all the help so yeah as we can see um wow that's quite the outfit you've got on yeah as we can see you know we can chat in real time I was impressed though that it could speed up its talking on demand of course one two three hey actually that's um that's a little slow could you count faster Sure Thing 1 2 3 4 5 6 7 8 9 10 okay that was a little bit too fast of course open AI must be given credit because for blind people this could be revolutionary right now the Royal Standard flag is flying above Buckingham Palace which is the signal that the Monarch is present how exciting to be there while the king is home try and tell me exactly what they're doing right now please um right now the Ducks are gently gliding across the water it will also help you with interview prep by analyzing your appearance how about you I'm awesome list listen I got some huge news oh do tell I'm allers well in a few minutes I'm going to be interviewing at open aai have you heard of them open AI huh sounds Vaguely Familiar kidding of course that's incredible Rocky what kind of interview well it's for a software engineering role but I just need to know do I look presentable professional well Rocky you definitely have the I've been coding all night look down which could actually work in your favor maybe just run a hand through your hair or lean into the Mad genis SP Your Enthusiasm is what's really going to shine through I don't have a lot of time so I'm just going to throw this on what do you think oh Rocky that's quite a statement piece I I mean you you'll definitely stand out though maybe not in the way you're hoping for an interview okay I got it I got it I could definitely see GPT 40 being used for mathematics tutoring here is a 30second clip but notice also the slight glitchiness at certain moments relative to angle Alpha all right so I'm pretty sure this is the angle Alpha right here right cor correct now looking at the triangle which side do you think is a hypotonus um remember the hyot I'm not totally sure I think I think it might be this one but I really am not sure this side AC you're close I also noticed quite a few glitches in the demo sounds cutting in and out the one feature though the open AI wants you to really focus on is the video in capacity you can live stream video straight to the Transformer architecture behind GPT 40 and so the video in functionality is certainly impressive note though that the reaction time of GPC 40 isn't quite as immediate as for audio Greg Brockman the co-founder of open aai has to prompt it to react to the Intruder overall stylish feel they might be getting ready for a conversation or a presentation given their direct engagement with the camera sounds like we're in for an intriguing interaction I'm curious about the lighting you mentioned could you describe it a bit more is it natural light are we talking unique fixtures or colors that add to the atmosphere the lighting here seems to be a mix of natural and artificial there's a noticeable bright light overhead probably a fixture casting a focused beam downwards which creates a spotlight effect this adds a dramatic and modern feel to the scene the rest of the room is lit more softly possibly by natural light filtering anything did anything unusual happen recently Yes actually uh just as you asked another person came into view behind the first person they playfully made bunny ears behind the first person's head and then quickly left the frame it added a light-hearted and unexpected moment to the scene now the focus is back on the original person with the leather jacket nevertheless gbt 40 can produce multiple voices that can sing almost in harmony and really try to harmonize here San Francisco San Francisco in the month of May but maybe make it more dramatic and make the soprano higher San Francisco in the month of May San franisco in the month of May it's a Friday C may we are harmonizing are Harmon great thank you and I suspect this real time translation could soon be coming too Siri later for us so every time I say something in English can you repeat it back in Spanish and every time he says something in Spanish can you repeat it back in English sure I can do that let's get this translation train rolling um hey how's it been going have you been up to anything interesting recently hey I've been good just a bit busy here preparing for an event next week why do I say that because Bloomberg reported two days ago that apple is nearing a deal with open AI to put chat GPT on iPhone and in case you're wondering about GPT 4.5 or even five samman said we'll have more stuff to share soon and Mira murati in the official presentation said that would be soon updating us on progress on the next big thing whether that's empty hype or real you can decide no word of course about openai co-founder ilas Sask although he was listed as a contributor under additional leadership overall I think this model will be massively more popular even if it isn't massively more intelligent you can prompt the model now with text and images in the open AI playground all the links will be in the description note also that all the demos you saw were in real time at 1X speed that I think was a nod to Google's botch demo of course let's see tomorrow what Google replies with to those who think that GPT 40 is a huge dry towards AGI I would Point them to the somewhat mixed results on the reasoning benchmarks expect GPT 40 to still suffer from a massive amount of hallucinations to those though who think that GPT 40 will change nothing I would say this look at what chat GPT did to the popularity of the underlying GPT series it being a free and chatty model brought a 100 million people into testing AI GPT 40 being the smartest model currently available and free on the web and multimodal I think could unlock AI for hundreds of millions more people but of course only time will tell if you want to analyze the announcement even more do join me on the AI insiders Discord via patreon we have live meetups around the world and professional best practice sharing so let me know what you think and as always have a wonderful day
Info
Channel: AI Explained
Views: 397,949
Rating: undefined out of 5
Keywords:
Id: ZJbu3NEPJN0
Channel Id: undefined
Length: 18min 43sec (1123 seconds)
Published: Mon May 13 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.