GPT-4o Deep Dive & Hidden Abilities you should know about

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

on Monday open AI released this revolutionary model called GPT 40 which is by far the best AI model out there I showed you a few demo Clips in this video so check it out if you haven't already but gbt 40 actually has a lot more hidden capabilities it can recreate an entire Pokémon game it's surprisingly good at chess it can solve an international math Olympiad problem in just one prompt so this video will be a deep dive on GPT 40 I'm going to go over how it works how they built it and why it's so revolutionary now first of all I've seen some comments on my YouTube videos saying that this Tech is not revolutionary it's just text to speech or speech to text or using stable diffusion for image generation this is not true so for example for a real time AI Voice Assistant traditionally you would have a voice and there would be a neuro Network that turns your speech into text so this algorithm would be speech to text and then this text would be fed into a large language model like gbt which would respond back in text so this middle step is what you get for chat Bots like chat PT and then you take another algorithm to turn this text into speech so this is called text to speech so it's going through three separate processes and this is of course Very inefficient and very slow that's the traditional approach now yes you still could get a real-time Voice Assistant this way but it's not very expressive and a lot of information is lost as you go from step to step now GPT 40 is a completely different animal this is true multimodal so they trained a single neuron Network on text audio and image data and it can also output either text audio or image this is true multimodal and because of this this likely is a completely new model trained from scratch and it's natively multimodal so in other words if you get this to generate an image it's not actually calling dolly 3 via an API to generate the image and similarly if you want it to respond in a voice it's not calling any text to speech API the same neuron Network this GPT 40 just spits out the audio right there natively and this is also much faster so somehow they've made it smaller or a lot more compute efficient now a while back we've had this mysterious gpt2 chatbot show up in this lmis arena for those of you who aren't familiar with lmis it's basically a platform where users can blind test all the AI models out there so for example the user will enter in a prompt and it's going to have two different AI models respond to that prompt and then the user chooses which one is the winner and you don't know which chatbot is on which side so it's a blind test and from all these blind tests it turns out that at least before GPT 40 GPT 4 Turbo was number one and so around 1 to two weeks ago we've had this new Contender this mysterious GPT to chatbot appear in the model listings and so nobody really knew what that was right now we verified that this is indeed GPT 40 and if you look at the overall performance of GPT 40 which is the leftmost bar here compared to the performance of all these other AI models it is by far the best model out there I mean the other models don't even come close and you know in the past 2 years we've seen a lot of these llms being released like mistol and then mixol and then Claude 3 llama 3 these were all just incrementally better than the previous model but it wasn't anything significant but here you see with GPT 40 it just blows everything else out of the water and you know it kind of makes sense that open AI would release the best model out there because it seems that these other competitor companies they are training their models with GPT data at least to some degree so for example mistol which is the French company which created an open-source llm the reason why they were able to create this so fast is because they just used the responses of GPT as training data but if you do that you're only going to get as good as the best model from open AI you're not going to be able to exceed it and so that's why even with all these major announcements from meta with their llama 3 or anthropic with Claude 3 Opus or Google with their Gemini 1.5 Pro even after all these big announcements from all these big tech companies still the number one model out there at least before Monday was open eyes GPT 4 Turbo and then now we're seeing GPT 40 just destroy everything else and not only that here is a second chart showing its performance in coding and you can see the previous models are just like incrementally better nothing impressive but again GPT 40 just blows everything out of the water this is super super impressive and on the note of coding they also demonstrated a realtime coding assistant check this out okay and to give a bit of background of what's going on so here we have um a Compu and on the screen we have some code and then the chat gbt voice app is on the right so chat gbt will be able to hear me but it can't see anything on the screen so I'm going to highlight the code command see it and then that will send it to chat GPT and then I'm going to talk about the code to chat GPT hey chat GPT hey there how's it going yeah it's going really well I was wondering if you could help me with a coding problem today of course I'd love to help you out what's the coding problem you're dealing with okay I'm going to share with you some code one sure thing take your time okay so I just shared some code with you could you give me a really brief one- sentence description of what's going on in the code this code fetches daily weather data for a specific location and time period Smooths the temperature data using a rolling average annotates a significant weather event on the resulting plot and then displays the plot with the average minimum and maximum temperatures over the year awesome thanks so so much and in the code there's this function Fu could you describe to me what the plot would look like if I had versus didn't have this function sure the function Fu XY takes in a data frame X containing temperature data and a window size y for calculating the rolling mean it Smooths the average temperature minimum temperature and maximum temperature columns by applying a rolling mean over the specified window with the fu function the plot would show smoother temperature lines because the rolling mean reduces the noise or fluctuations in the data awesome this makes thanks so much so yeah I definitely will want to apply this function and so I'm going to uh generate the plot and then show it to you give me one second absolutely take your time I'm excited to see the plot when you're ready yeah so so far chat gbt has only been able to see the code and now I'm going to run it and then I'll use the vision capabilities of the chat PT desktop app so it can actually see everything going on on the screen as well okay chat GPT I'm sharing with you the plot now I'm wondering if you can give me a really brief one- sentence overview of what you see the plot displays smoothed average minimum and maximum temperatures throughout 2018 with a notable annotation marking a big rainfall event in late September fantastic I have another question for you um which months do you see the hottest temperature temperes and roughly what temperatures do those correspond to the hottest temperatures occur around July and August the maximum temperature during these months is roughly between 25° and 30° 77° fhe to 86° fah what's going on with the y axis is this in Celsius or Fahrenheit the Y AIS is in Celsius the temperatures are labeled accordingly such as average temperat minimum temperature and maximum temperature yeah so as we can see you know not only can tra GPT help me solve very easy linear um algebra equations that I need help with it can also interact with you know code bases and see the outputs of plots and everything like this going on on a computer so right now we have the best the smartest AI model out there helping you code in real time on your desktop so I mean for all these other coding assistants out there like GitHub go pilot or GitHub workspace or Deon I all I can say is May you rest in peace and it gets crazier so it turns out that GPT 40 is also insanely good at solving chess puzzles so this is a benchmark of how good it is at solving these chess puzzles and just to give you some context chess puzzles are a very challenging problem for most humans let alone an llm given a textual description of the entire board in just a few characters so this is different from like Google's deep mod mind which plays an entire game of chess from start to finish in this scenario it's only given a particular snapshot of an existing game in other words it's only given a textual description of the entire board in just a few characters and from this here's a comparison of all the major models out there right now and you can see the the adjusted Lo which is the Benchmark metric 1790 way higher than GPT 4 Turbo which is 1144 and the percentage of puzzles solved by GPT 40 is 50.1% that's more than double the previously leading model gp4 turbo which is only 22.9% just absolutely insane here's a quick recap of what the real-time voice feature can do it can teach anyone any language hey my friend and I are learning Spanish and we're wondering if you could tell us the names of these objects in Spanish cool what about what are these objects in Spanish the objects you're showing are and in Spanish nice free choice cool what about these those are do poos or do Plumas in Spanish so what do you think happened after this video was released well the language learning app du lingo crashed around 5% on the day of the announcement so I mean for all these language learning apps out there all I can say is May you rest in peace here's another crazy thing about it it can emulate the entire game of Pokémon Red now of course this is just a command line interface so it's not generating the entire game design like you don't have a map and a character which you can move around but you're given the options so for example you can enter in your choice you can talk to Professor Oak so these are exactly the options that are found in the real Pokemon Red game so I'm just going to fast forward this a bit all right so you're given three options for your first Pokémon and the user chooses Charmander and then his competitor which is Gary in the real game he chose Squirtle all right so now they are going to duel each other so you and your rival are going to fight each other again this is just a command line interface so it's not an actual game design with Charmander fighting Squirtle but it has all the same function so you select an attack and then your opponent selects an attack and then it goes on and on so it does run like a real Pokémon game so this is just super impressive you can get GP 40 to recreate the Pokémon Red game anyways I'll link to this tweet in the description below so you can check out the full video here's what doctor Jim fan at Nvidia has to say about this new GPT 40 there are some really interesting insights here so technique wise open AI has figured out a way to map audio to audio directly as first class modality and stream videos to a transformer in real time so at a very high level this just means as I've mentioned before everything is trained and inputed and outputed into this one neural network so it's natively multimodal so this requires some new research on tokenization and architecture but overall it's a data and system optimization problem so in terms of getting highquality data well you can get a lot of highquality video and audio data from YouTube podcasts TV series movies Etc you can also get the AI to generate synthetic data this is great because it could in theory generate unlimited data as long as it's good quality this data can be used to train the next generation of AI this is important because one of the limitations is we might not have enough real world data and yes everything on the internet everything on YouTube it's a lot of data but it still might not be enough to train a really smart really capable AI model and that's why we need to generate synthetic data to supplement this real world data and this is very interesting he said that the latency would not meet real-time threshold if every video frame is decompressed into an RGB image so you're likely going to have a lot of delay if you take the traditional approach and you break down the video into frames of images per second instead open AI has likely developed their own neural first streaming video Codec to transmit the motion Deltas as tokens so what this means is they probably developed an algorithm to break down the video to feed into the neural network and instead of breaking it down into frames per second they are only capturing the changes frame after frame and this saves a lot of data this is a lot more efficient and then he also mentions the assistant is much more lively and even a bit flirty which I'll get to in a second and yes it sounds very much like her which is why I put her is here in the thumbnail of my video and then this is also very insightful whoever wins Apple first wins big time and I mean this is because apple has a huge market share in terms of smartphones and laptops so whoever gets to partner with apple and integrate their AI models with apple is going to win big and just 4 days ago we have received news that apple is nearing a deal with open aai to integrate GPT on the iPhone and you know this is likely to be the case because Apple's biggest competitor at least in the smartphone market is Android which is owned by Google so Android will very likely be powered by Google's own Gemini and they're probably not going to provide Gemini to Apple so Apple has to partner with open AI anyways back to GPT 40 here's a demo of one AI acting as a cust customer and the other AI acting as a customer support agent hey o this is Joe I'm going to connect you to act me Telco now the new iPhone they sent me isn't working I want you to I want you to get them to send me a replacement device can you take care of this for me you can count on me Joe I Got This Ring Ring Hey Joe this is Jamie from Acme Telco how can I help you out today hi there I'm calling on behalf of Joe who recently received a new iPhone from Acme Telco but oh got it when did Joe received the new iPhone the iPhone was delivered 2 days ago cool could you share the order number with me of course it's 10293 849 great thanks is the device physically damaged in any way no there's no physical damage it's just not turning on despite trying all the basic troubleshooting steps bummer um would Joe prefer to return the iPhone to an Acme talo store or ship it back shipping it back would be the best option could you please provide the shipping information and process to get a replacement Absolutely I'll email a shipping label and return instructions could you provide Joe's email address for me sure it's Joe at example.com awesome all right I've just sent the email can you check if Joe received it we'll check right now please hold sure thing Hey Joe could you please check your email to see if the shipping label and return instructions have arrived cross yep I got yes I got the instructions perfect Joe has received the email thank you for your help no problem at all anything else you need help with that's all for now thanks again you're welcome have a great day Joe so takeway of this video if you're in customer service all I can say is good luck in my previous video I showed this clip of the AI tutoring this kid on how to solve a math problem she was able to guide the kid extremely well and ultimately teach him how to solve the problem and so what does this mean for tutor centers or Tutors or teachers all I can say is good luck to you as well like I said there was this mysterious gpt2 chatbot that appeared in this lmis Arena 2 weeks ago we now confirm that gpt2 chatbot is indeed GPT 40 what's interesting is that this person Andrew GA showed that it could solve an international math Olympiad problem this is the Olympics for math problems these are like super comp Le Lex math problems that only the four best math students in the USA get to compete in so needless to say these problems are extremely hard but GPT 40 was able to get it in one shot that means he didn't need to prompt it further it was able to answer the problem in just one prompt so this thing already exceeds 99% of humans in math super smart thanks to the sponsor of this video you picks if you're feeling overwhelmed with mid Journey or stable diffusion you don't want to worry about prompting or learning all these different settings well upix has made it dead easy for you to generate highquality realistic images of yourself or anyone else in just one click it works on desktop as well as on your phone you don't need to install any apps or anything it just works straight from your internet browser simply select the template and then upload your photo and then click create it's as easy as that and look how realistic the results are there's many templates for you to choose from and more to come so check it out at up.app another thing is if you notice from the clips I played in the last video she talks and Giggles a lot oh Rocky that's quite a statement piece I I mean you you'll definitely stand out now a few folks in the comments mentioned that for most men hearing a female giggle at you and giving you positive vibes may very well be attractive or seem seductive now human psychology is actually very easy to manipulate so it wouldn't be a surprise if we soon have humans getting very addicted and attached to this AI voice friend if we have a companion who's always available 24/7 who never argues with you she's always supportive she can give you advice and you can ask her anything anytime she even giggles at the lamest things you say then well she's perfect now compare this with a human partner who often argues with you you need to spend a lot of time and money on them they don't giggle at your lame jokes I mean will people even want to date humans anymore and then same thing with friends do we even need friends anymore when we have this perfect companion which we can talk to all day now of course I only say this half jokingly there's obviously value to real human interaction that you don't get from talking to an AI so no I don't think human friends or Partners will be obsolete but I think it's safe to say that relationship ships will change significantly from this release but on the note of chatting another great use case for this voice assistant is therapy counseling and Senior Care in fact it's already scientifically proven that AI can beat 100% of human psychologists on a test of social intelligence so if you're currently a psychologist a therapist a counselor all I can say is good luck now GPT 40 actually has plenty of other capabilities that they didn't demonstrate on Monday so I'll link to this page in the description below you can scroll down to this section explorations of capabilities here are some examples so as we know we can get it to generate images so here's the input a first-person view of a robot typewriting in the following journal entries here is the text the text is large legible and clear the robot's hands type on the type writer and here's the image you can see that the text is very accurate there are minor errors like it's missing the one here here the I is capitalized in the image same with the K in kind of but other than that this is the most robust text out of all the image generators out there even for stable diffusion 3 which is just released it still sucks at generating text especially long sentences so here's the second input the robot wrote the second entry the page is now taller the page has moved up there are two entries on the sheet and so here's the additional text and you can see it has added this here now again there are a few typos for example this e in every has this accent this I should be an l and then this e also has an accent so it's not perfect also note that this isn't an inpaint feature so notice the hands are missing the typewriter is slightly different as you can see up here so it's generating a new image but it's trying to maintain the consistency so the typewriter like this red bar and this green thing here it's kind of the same as what you see here but note that this is not in painting and then finally the robot wasn't happy and rips the sheet of paper the two halves are still legible and clear as he rips the sheet and you can see here the text is still very legible again just a few minor typos like this T should be an L this e should not have an accent but overall in terms of text generation in an image this is way better than stable defusion and mid Journey it can also produce consistent characters so here is a prompt of a male delivery person with a smile on her face so that's her here and then if you take this image and you attach it as an image in your next prompt this is Sally this is the mail delivery person and then you prompt it with Sally is standing in front of a red door to a house you can see it maintains this character and generates this new image based on your prompt and then the user prompts it further with different scenarios and it's able to maintain the appearance of Sally as you can see here so a lot of versatile things you can do with this here's another example of a consistent character so it has outputed this robot you input this image into your next prompt and you feed it with different scenarios and it's able to maintain this robot character perfectly across all these different scenarios here is another demonstration of image generation with text so here the prompt is neat handwritten Illustrated poem you're feeding it the poem here and this does indeed look handwritten and then here it says elegantly decorated with surrealist Doodles and which is what you see here along the border you can also change it to dark mode and voila and then next the user prompts it to remove the notebook paper lines and it also does this perfectly you can also generate fonts with this for example if you prompt it the letters a b CDE e f displayed in three rows displayed as one would showcase a font in a font book the font combines both futuristic but retro elements a mold stamped font and this is what you get here's another one the prompt for this is an ultra futuristic font that is a signature of the artificial intelligence Revolution so you can generate entire fonts with this AI here's another one this is steampunk an oldfashioned Victorian font that looks ornate and belongs on a steam engine again very impressive you can also turn a realistic photo into a caricature as you can see here another impressive thing is you can get it to render a 3D model so how you would do that is you prompt it with for example a realistic looking 3D rendering of the open AI logo with open AI shown below this is view zero you prompt it five more times so you get view 0 1 2 3 4 5 and then you glue all these different angles together to form your 3D model how cool is that this function is great for e-commerce so here's a PNG logo of your brand for example here is a product this is a coaster with no brand branding you can get it to etch your logo onto this product like so just super super impressive for those of you who've been playing around with image generation this is also very impressive so the prompt is an image depicting three cubes stacked on a table top cube is red and has a g on it middle cube is blue has a p bottom cube is green and has a t so here are all the outputs you can see every time it consistently gets the colors and the letters correct well maybe except for this one but if you've been playing around with with stable diffusion and mid Journey both of those tools cannot get this they're just not really good at understanding the context of your text prompts so this is also a breakthrough what I suspect they did here to get this consistency is they probably merged a Transformer model with a diffusion or another image generation model so it can understand context better but of course this is closed so that's only my guess we don't really know the architecture of this and finally I want to end with this so I'm getting a lot of comments in my previous video asking when will it be out will it be wrote out to Canadians or people in the UK or other countries so I'll link to this page in the description below as well but here's what we know so far GPT 40 will be available in the chat GPT and API as text and vision model so you can prompt it with text like you would in chat GPT you can also feed it images to analyze so this is the vision model and then chat GPT will continue to have support for Voice via the pre-existing voice mode feature in so for those of you complaining that the voice doesn't sound like the demo videos well that's because they haven't rolled out this new expressive flirty voice yet right now it's still using the pre-existing voice mode and then next they say specifically GPT 40 will be available in the free plus and team tier so you can already use it even if you're not paying for a plus plan and if you scroll down a bit here's what they have to say about the free tier users on the free tier will be defaulted to GPT 40 with a limit after they exceed this limit then it would be switched back to 3.5 free users also receive limited access to messages using Advanced tools such as all of this so data analysis file uploads browse discovering and using gpts and vision so this is also a major announcement if you're in the free plan previously you cannot use gpts in the GPT store that's only available in the plus plan right now it looks like they are rolling this out to free users as well and then same with vision previously in the free plan you can't upload an image and get it to analyze it right now it looks like they are rolling this feature out to free users as well so anyways that sums up all we know about GPT 4 right now let me know in the comments if you've discovered any other cool things about it let me know how you're going to use it do you think it will be revolutionary and change all these industries that I just mentioned in this video would you prefer talking to to this rather than talking to your human friends or partner let me know in the comments below and if you enjoyed this video remember to like share subscribe and stay tuned for more content thanks for watching and I'll see you in the next one

Info

Channel: AI Search

Views: 186,110

Rating: undefined out of 5

Keywords:

Id: _DgkGAaJwYs

Channel Id: undefined

Length: 28min 11sec (1691 seconds)

Published: Tue May 14 2024