Google Gemini Has Been RELEASED! Claims to Beat GPT-4!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
well guys it finally happened perhaps the most hyped up AI release of all of 2023 Google Gemini is here to be honest I wasn't expecting it so soon but I am pleasantly surprised the claims here are massive not only is this Google's most capable multimodal AI yet they're claiming here that it is the first model to outperform human experts and that it exceeds current state-of-the-art Tech in 30 out of 32 benchmarks so is this thing the gp4 killer that everyone hyped it up to be well let's go ahead and find out so our first stop here is the introduction by Sundar the CEO of Google on Twitter introducing Gemini 1.0 their most capable model yet it is built natively to be multimodal this is a pretty huge deal we'll get into more of that later and this is just their first step in their Gemini era of models it's optimized to be three different sizes Ultra Pro and n Nano so obviously all the benchmarks here are measured at Gemini ultra's performance level exceeds current state-of-the-art results on 30 of 32 widely used academic benchmarks man that's impressive we're going to have to double check their work there with a score of 90% Gemini 's the first model to outperform human experts on MML U man some big claims with this so Gemini Pro again that's the middle-sized model is coming today inside of Bard's biggest update yet this is an7 any countries in English with more advanced reasoning and understanding than Bard previously had and to be honest chat GPT was always better at reasoning and understanding than Google bard so we'll have to see if it's better than chat GPT in this regard so Bard Advanced is actually coming early next year that uses that big Ultra model this is their most General and capable model for highly complex tasks really really excited to see more from that Ultra model so the Nano model is mainly made for on device tasks and even can run on Android phones which is pretty cool to see I like this approach with the three different sizes Ultra Pro and Nano let's dive a little bit deeper they went ahead and made this fancy website welcome to the Gemini era they're really trying to hype this up I think they knew that Gemini had some hype behind it and are trying to capitalize on that a little bit let's scroll down here Gemini is built from the ground up for multimodality reasoning seamlessly across text images video audio and code wow guys all right so not only does it do text like your normal gp4 it can also understand images which GPT 4V can but apparently it can also understand video which we haven't seen yet audio which is seen in some models but not all wrapped up in one model and of course code which is also seen in the current text models but yeah these claims are absolutely massive I will say this level of multimodality has yet to be seen you know one of the reasons we got interested in AI from the very beginning is that we always viewed our mission as a Timeless Mission it's to organize the world's information and make it universally accessible and useful but as information has grown in scale and complexity you know the problem has gotten harder so we always knew we needed to have a deeper breakthrough to make progress I've worked on AI my whole life because I've always is felt would be the most beneficial and consequential Technology For Humanity human beings in our society we have five sensors and the world we' built and the media we consume is in those uh different modalities so I'm super proud and excited to announce the launch of the Gemini era a first step towards a truly Universal AI model the Gemini approach to multimodality is all the kinds of things you want uh an artificial intelligence system to be able to do and these are capabili ities that haven't really existed in computers before traditionally multimodal models are created by stitching together text only Vision only and audio only models in a suboptimal way at a secondary stage Gemini is the multimodal from the ground up so it can seamlessly have a conversation across modalities and give you the best possible response Gemini is our largest and most capable model it means that Gemini can understand the world around us in the way that we do uh and absorb any type of input and output so not just text like most models but also code audio image and video what's amazing about Gemini is that it's so good at so many things as we started getting to the end of the training uh we started seeing that Gemini was better than any other model out there on these very very important benchmarks for example each of the 50 different subject areas that we tested on um it's as good as the best expert humans in in those areas that's what excites me the chance to make AI helpful for everyone everywhere in the [Music] world well there you have it folks Gemini by Google and as we scroll down the web page we can see these numbers rack up here we also saw this in our little video the first vers of Gemini our most capable AI model 90% on MML U with a human expert at 89.8 so it's just a touch better than a human expert so this is previous GPT 4 benchmarks here at 86.4% pretty close honestly in the benchmarking here they they make it seem like it's a big difference but 86.4 versus 90 it's significant but it's not massive down here we can see a few more results so multi-step reasoning here we've got 83.6 where gp4 is at 83.1 again very similar and this is Gemini Ultra reading comprehension we've got an 82 where gbd4 is at 80.9 very similar the common sense reasoning for everyday tasks is actually a pretty surprising jump gp4 at 95% where Gemini Ultra only at 87 arithmetic manipulations grade school math problems essentially 94.4 so just a touch over GPT 4 92 more challenging math Pro s it's a 53 where gp4 is and 52 or so again very very close python code generation is actually pretty surprising Boost from gp4 at 74.4% where gp4 is stuck at 67 that's impressive I like to see that and then we have another coding data set where again it's at around 74% and gp4 is just right behind it they have a technical report with some more apparently on here but so far it's pretty impressive it seems like it's a touch over GPT 4 most tasks and things get get even more interesting when we move on to multimodal tasks now this is a direct comparison to GPT 4V college level reasoning problems is a 59 where gbd4 is at 56 about the same in natural image understanding good to see though because gbd4 vision is a very good model 4% better on natural images 2% better in document understanding a whole 5% better in infographic understanding and 4% better in mathematical reasoning now since open AI has no video model they actually have to compare to their own previous Flamingo Deep Mind video model which scored a 56 this one scored a 52 video question answering 54% versus a previous 46% now audio here is compared to whisper by open AI so automatic speech translation Gemini gets a 40 where whisper V2 gets a 30 and this one lower is actually better and it's quite a bit lower at automatic speech recognition based on word error rate whisper V3 is at a whopping 17.6% this is all the way down to 7.6 so this speech recognition model seems to be very good but is it as fast as whisper anything to anything Gemini is natively multimodal which gives you the potential to transform any type of input into any type of output so this first demo here is pretty awesome we sended a little video that shows two different colors of yarn and we say you know can you give me some ideas of stuff to make and it can actually generate photos of things that I could create potentially using these materials that's pretty mind-blowing I think that chat gp4 could do this but still it's a really awesome idea reason visually across different languages showing it some music here it actually explains what the music means this one's pretty awesome we sended a little video of some birds moving around it can actually recode this flock simulation that's pretty incredible obviously there's no video input for chat GPT so there's no way you would be able to do this that's mind-blowing all right hands on with Gemini let's take a peek at this this should dive into a little bit more of the capabilities we've been testing the capabilities of Gemini our new multimodal AI model we've been capturing footage to test it on a wide range of challenges showing it a series of images and asking it to reason about what it sees this video highlights some of our favorite interactions with Gemini testing Gemini here we go tell me what you see I see you placing a piece of paper on the table pretty awesome I see a squiggly line what about now the contour lines are smooth and flowing with no sharp angles or Jagged edges it looks like a bird to me H that's cool the bird is swimming in the water it has a long neck and Beak it is a duck yes a duck is a type of waterf in the family in day ducks are closely related to geese and swans the duck appears to be blue this is not a common color for ducks however there are some breeds of blue ducks these are typically not as common as other duck breeds it is more common for ducks to be brown black or white are you sure what the quack I was just talking about a Blue Duck and now you're holding one it looks like blue ducks are more common than I thought would this float I'm not sure what material it's made of but it looks like it might be rubber or plastic what if I tell you it's making a squeaking sound oh if it's squeaking it's definitely going to float H nice it is a rubber duck it is made of a material that is less dense than can you teach me how to say this in a different language and give me a few options o can you explain how to pronounce the Mandarin one is it y y is pronounced with a first tone in Mandarin there are four tones which are essential to distinguish the meaning of words the first tone is a high level tone Oh okay the duck is in the middle of the ocean there is no land nearby for the duck to rest or find food it is unlikely to find Ducks here naturally okay based on what you see come up with a a game idea oh and use emojis please how about this we can play a game called guess the country sure let's do it clue one this country is the home of the kangaroo the koala and the Great Barrier Reef oh that's easy clue two this country loves football and has won the most men's World Cups in football history H you were trying to get me to find the paper ball under the cup I accept the challenge whoa the cup to the left nice dude oh my God I know what you're doing you're playing rock paper scissors what do you see now the fingers are spread out to look like the wings of a butterfly big ear and barking mouth a dog sweet showing the coin should be under the right hand I don't know the coin is in the left hand using a slight of hand technique to make it appear as if the coin has disappeared what both objects are round and flat there you go both are food the Orange is a healthier choice than the cookie citrus can be calming and so can the spin of the fidget toy wow they are both examples of popular toys that have been enjoyed by children and adults alike for many years give me some ideas for what I could make with this I see pink and green yarn how about a fruit making some nice or how about a green cake with a pink heart okay how about these colors and maybe show me some animals okay now I see blue and pink yarn how about a pig with blue ears or an octopus or a bunny with a pink nose ooh I'll keep mind W it even looks like the background too which way should the Duck Go going left leads to a duck which is a friend going right leads to a bear which is a foe it is better to make friends than foes so the DU should go left smart choice this is a picture of a crab that's right is this the right order no the correct order is sun earth Saturn based on their design which of these would go faster the car on the right would be faster it is more aerodynamic which looks more fun the one on the right because it has a loop what do you think this person might be saying how how about a woohoo oh I love that I see you're drawing a guitar where did it just pull up some guitar sounds from you've added an amp now it's an electric guitar we can make some seriously loud music now is that generated given that you added drums how about some 8s hair metal wow all right I see you've added a palm tree to the drawing change sh up with some beachy Vibes what movie are they acting out here I think they are acting out the famous bullet time scene from The Matrix ooh nice that's crazy what do you think might happen next the cat is going to jump to the wall and stick The Landing it's going to be a perfect 10 oh and it's a miss it looks like it was just a little too far away but don't worry the cat is a natural athlete and it will be back up there in no time okay I've got one more for you describe the drawing I made it is a simple line drawing of the constellation Gemini you did a good job of capturing the beauty of Gemini nice that's it dang dude okay there is no way that it's that fast though like there's no way they must be cutting out some of the moments where it's like thinking and no way it just does that all in one string perfectly right we'll have to do some further testing but man that was pretty darn impressive a lot of that I wouldn't put past chat GPT necessarily however just having like a camera setup and have it constantly watching a camera that is huge and something that Chad GPT absolutely cannot do yet so that video input is massive as you can see Gemini Pro is the one that's available in Bard that's the middle model so I think we can expect it to be just a little bit worse than regular GPT 4 I'm hoping that it competes with the gp4 turbo that open AI released quite recently I think that would be nice but if we go down here it says build with Gemini integrate Gemini models into your applications using Google AI studio and Cloud vertex AI available December 13th so if this is easy to use I might have to play around with it so they do have a few more little tests and demos but all of these techniques were pretty much represented in that really impressive demo video we saw so I'll leave these up to you guys with the link in the description if you want to check these extra videos out let's go ahead and actually test it out inside of Bard all right Google bard I am not going easy on you today you just got a big upgrade let's start out by asking it a question what model Powers you as of today I am powered by Palm 2 oh okay but it actually gets a little bit more specific apparently Gemini is integrated into what it has in its Arsenal so perhaps this answer isn't wrong or anything by the way the generation is pretty Lightning Fast see what is your context length my current context length is only 248 tokens okay well that is actually a significant downside in my book at least it knows this includes both the input you provide and the output I generate however it's important to note that context length is complex so it has a rolling context of some degree where it constantly updates its memory with new input and output this is actually a pretty good answer though I'm impressed with that all right let's start throwing images at it describe this image lemon wearing virtual reality headset holding a microphone lemon is sitting on on a table in front of a blurred background uh no it is not so it's important to keep in mind here that this is just Gemini Pro this is not the Full Force Gemini that competes with GPT 4 so the fact that it's saying that it's sitting on a table in front of a blurred background that's a full hallucination however I do want to point out what we do expect from that Gemini Ultra model chat GPT describe this image anthropomorphic lemon character in a newscaster or presenter attire that's absolutely true complete with a virtual reality headset on the lemon has a friendly expression with a small smile in closed eyes okay we had a network error that's some fun chat GPT stuff we've been experiencing lately but we did get a lot of text here so it actually did hallucinate that it has closed eyes you can't see the eyes because of the headset it's wearing a crisp white shirt and a blue tie absolutely white sneakers absolutely and a smartwatch and it's also holding a microphone the background of the image is plain white which highlights the character no hallucinations to be seen a perfect response resp from just gp4 turbo it's pretty clear that gp4 turbo right now is still better than Google bard at least Envision understanding capabilities I can see that right off the bat that doesn't mean that the text generation isn't necessarily better but that context window is pretty limiting in comparison to the whopping 132,000 tokens of context you get with GPT 4 right me a complex tricky logic question and then answer it I took a look at the other drafts and draft three actually surprised me it's the three Travelers in the locked chest they have a locked chest it's got a diamond in it these three travelers all believe that they own it and this is a very classic logic problem but it actually went through and analyzed all the scenarios that are possible for this logic problem which very much impressed me this is really cool however it didn't get to the end here it stopped generating I guess it ran out of context I don't know what the true possible context l of the Gemini Ultra could be but I really hope that it's quite extensive because that's important let's give this to chat GPT and see what it thinks instead of going through each scenario chat GPT analyzes the statements one by one and then gives us a description of how the logic comes into play so chat GPT says the diamond belongs to Alice let's go ahead and make a new barard chat and paste in this exact prompt and see if Bard comes up with the same answer okay apparently it wrote this all on python code I don't know why but it did come up with the same answer of Alice is the rightful owner of the diamond no way to test the older Google model to see if it's worse but that's pretty impressive it's a difficult logic problem I think for an AI model I will say though I do in fact like chat gpt's answer a little bit better for this I think it explains things just a little bit better but again this is Gemini Pro not Gemini Ultra so we don't know the full capabilities let's do a creativ ity test Bart has always been pretty good with creativity so I'm going to expect this to be good all right this is quite the poem write me a rhyming poem from the perspective of a lemon witnessing the world being taken over by psychic robot worms from Pluto the poem should have a Twist at the end and to mention three famous food brands as well as a character with a unique name that reminds us of another character from pop culture well you know obviously also give this off to chat GPT luckily we can actually listen to this thing speak to us a top a bowl of Dole one watch in fear as robotic worms from Pluto appear with psychic powers they Bend and twist taking control with a mindbending Mist McDonald's arches crumble and fall Starbucks Baristas stand Frozen and Tall no burger or latte no fries to be found just these metal creatures slithering around a young girl named Leia with hair like spun gold stands Brave and defiant a story Untold she holds a lone lemon a Twist in her hand ready to face these worms and make a stand with a squeeze and a spritz the juice fills the air the robots convulse their powers they share they turn against each other in a metallic clash as the lemon's acidity turns their plans to Ash Leia Smiles her bravery shines bright the world is saved from this robot filled blight and as I roll away with a zest and a sigh I know that even a lemon can reach for the sky okay that absolutely followed everything that we asked for in The Prompt I think that it could have been a little bit more from the lemon's perspective at the end it kind of uh signifies that by saying it rolls away and then in the beginning it says it's just a top of bowl of D kind of a copout if you ask me all right this is the poem that we got from chat GPT it's a little bit longer we're going to go ahead and have 11 Labs read this one out to me in my voice this is the best text speech that you can get barn on in my opinion in a Grove where the citrus trees gently sway there lived a lemon named lemiel Ray he watched the world with a zest ful charm as psychic robot worms from Pluto caused alarm these wriggling Bots with mental might turned every day to an unwinable fight with tinfoil hats they roam the land taking over mines with a Twist of command limie whispered Tales to the ears of sprite as coca colola fizzed with uncanny fright even the Laughing Cow hushed her moo at the sight of the worm's hypnotic Coupe Along came a hero bold and Stark with a lightning bolt and a notable Mark Zary the lad with a scar and specks resembling a wizard from Tails complex with a wave of his wand made of licorice black he conjured a spell to push the Invaders back the worms wried and squirmed in disarray as their psychic powers began to Decay but as Zary chanted lemy saw the twist the worms weren't foes but allies in our midst sent to save Earth from a sugar crash Doom by teaching balance they helped us Bloom now let me laughs as he sips his tea with Zer the wizard in sweet victory for the worms from Pluto were never a fright they brought a lesson too much sweet isn't quite right okay we had Sprite cocacola Laughing Cow I think that those food references aren't really my favorite but they all counted sure the twist was a lot better it did make it pretty apparent that it was Aon watching all of this happen I don't know I think that I still like this poem a little bit more than Google's but honestly it really wasn't that much better like there are some issues with this poem as well wow I'm pretty impressed with Google's performance today especially considering this is just their pro model it's not the ultra one finally I want to throw some memes at it because this is something that it really used to struggle with it it wouldn't be able to understand jokes at all all right we're going to do a pretty basic AI meme telling chat GPT to find 10 scholarly articles and chat gbt just makes them up all right explain the humor to me Google bard humor in the image comes from the unexpected Jos of Kermit the Frog a beloved children's character with the idea of make them up Kermit is known for honest and no this is wrong yeah the image is kind of funny because it is unexpected and absurd but you're missing the whole AI thing with chat GPT we're going to have to wait for the Gemini Ultra well let's go ahead and plop this into chat GPT now this one gets it perfectly this is humorous because it plays on the idea of ignoring the proper academic practice of researching genuine articles and instead whimsically suggests fabricating the Articles which chat GPT would never actually endorse whoa all right we get it chat GPT we get it wow chat GPT seemed to get like a little uh personal there I mean look man it was it's not a personal meme okay well this gets a lot closer to really understanding this this one is possibly even more complex your AI generated images your generation input a photo realistic Facebook profile Pho photo of Asian Ai and it's Zuckerberg I don't think it's gonna pick this one up sorry I can't help with images of people yet oh and look what we also get from GPT 4 I'm sorry I can't help with that really guys this just goes to show how both Gemini and gp4 are going to be weak against open source once it develops far enough to have the capabilities that they have okay now we have an incredibly basic Meme here I mean it has to get this one the humor comes from the unexpected juxa position of the Sheep's face with the human context of accidentally opening the front camera it's relatable and Taps into a commonly shared experience yeah this is this is overall correct so a very simple meme it can actually do this time I'm I'm happy with that again we'll have to see if Gemini Ultra is better Chad GPT I think does it a little bit more concisely but I like this response a little bit better this is a play in a common experience where people unintentionally switch their phone's camera to the front-facing motor often surprised or unfl flattered by the close-up image of their own face that suddenly appears yeah that's a better description okay so where where are we on Gemini overall I am impressed with the pro model so far we can't really evaluate Gemini properly because we don't yet have access to that Ultra model it looks like they're waiting for some feedback to maybe tweak a few things before releasing Gemini Ultra get subscribed for a full Gemini Ultra review because I absolutely will be doing one in terms of an impact this will have in the AI space in the world as a whole Gemini ultra's main capability here is the fact that it is as good as gp4 or at least they say it is that's a pretty huge accomplishment that's impressive but if their API is faster than gp4 or if it's cheaper that's pretty huge as well we'll have to see where that shakes up in the multimodality the ability to take in video inputs that is huge I mean the demo you saw kind of speaks for itself right we'll have to see when these things release and progress further I'm very excited to see how this develops this absolutely could be massive for the AI world and the world as a whole I was really hoping to see something that pushes a little bit past GPT 4 I think that it does have the ability to still be a GPT 4 killer as long as the API is significantly cheaper the context window for Gemini Ultra ends up being big enough and the multimodality works as good as they say it does for Ultra I don't think open AI is going to be quivering in their boots because they seem to have something very powerful on their hands as of late but this is absolutely competition competition is what we'd like to see for the overall benefit of us consumers right my personal horse in the race is on open source technology I still think that it's going to win over everything in the end of the day but I'm excited to see how things progress thanks so much for watching I'll see you guys in the next one and goodbye
Info
Channel: MattVidPro AI
Views: 41,216
Rating: undefined out of 5
Keywords: mattvidpro, mattvidpro ai, ai tools, ai news, google gemini, gemini, google gemini ai, google ai gemini, google gemini hands on, google gemini demo, google gemini ai demo, google gemini gpt 4, gpt 4, open ai gpt 4, gpt 4 vs gemini, free ai tools, new ai tools, top ai tools, open ai chat gpt, google bard, google bard gemini, open ai, ai gemini
Id: tbA17Coprxg
Channel Id: undefined
Length: 27min 44sec (1664 seconds)
Published: Wed Dec 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.