Let's Test Gemini Pro (honest comparison with GPT-3.5 & GPT-4)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we are going to be firstly debunking a lot of misconceptions that people have which I'm really irritated by talking about Google's deception and then we're also going to go and actually test out Gemini Pro against GPT 3.5 and GPT 4 across a different parameters even this doc here okay like the first thing that I do want to say is like yes Google was kind of deceptive um in terms of their demos so the demos that they showed like for example this one I know what you're doing you're playing rock paper scissors what's this so like stuff like that I get it it seemed like it was being spoken in real time but actually it wasn't um if you check out the developers How It's Made interacting with Gemini through multimodo prompting it actually does show the fact that they would give it like a hand and say like tell me what you see I see a person's right hand the hand is open with the fingers frad apart and then they'll do another one like a person is knocking on a door how about this one I see a hand with two fingers extended which is a common symbol for the number two say what if we ask Gemini to reason about all these images together and it's say what do you think I'm doing hey it's a game you're playing rock paper scissors so they are prompting with still images and then also giving them information as opposed to real time the way that the video was showing so that part was deceptive definitely not great but with that being said though right there is this misconception that people have as of December 9th 2023 so the first one is that Gemini Pro is not supposed to be a competitor for gb4 okay so for anybody that's saying like oh like why is Gemini Pro not as good as chat gbt when I'm using it if you're using GPT 4 that's not like it's not supposed to be there Gemini Ultra is supposed to be the competitor for for GPT 4 and that's not out yet all the demos that they showed all those like cool things it was kind of more like a teaser um for Gemini Ultra which is supposed to do all these things on Google's side like I don't know if that was like the best choice I think what they did is like they wanted to get something out especially after like all the opening I stuff that happened in Dev day and things like that so they wanted to Showcase something but they delayed Gemini Ultra until next year but they were like well you know what like we got to like show something so when you compare like those two it's not really a fair comparison we should be like comparing Gemini Pro to gbd 3.5 which is what we will be doing today and another thing is Bard is using Gemini Pro but only for text so if you're talking about like images and things like that and other modalities like speech images code it's not using Gemini Pro right now it's still using the old models for this so that's also not a good comparison again they Google should have made this a lot clearer and I think that's a lot of people saying oh my God like it's trash it is trash because it's not using the updated model other one if you guys watch a demo for Alpha code 2 and get a working prototype in less than a minute it's really helpful uh it's like a fine Ted version of Gemini Pro but it's not what is current L being used in B I want to just put that out there misconceptions okay let's test out GPT 3.5 versus Gemini Pro versus GPT 4 so first question can you write a short story about a dragon who loves poetry wa that was really fast from gbd 3.5 once upon a time in a Hidden Valley nestled between towering mountains there was there lived a dragon named Drear providing that even the mightiest creatures could have the gentlest Souls a that's pretty good that's pretty good actually okay now let's look at oh okay so we attempted to actually um have a picture as well so good job on that b quite proud of you in the heart of the emerald mountains nestled amongst the Whispering Pines and cascading waterfalls lived a dragon named ignis unlike his fire breathing Brethren ignis was not interested in gold or Jewels nor did he find joy in scorching the countryside ignis had a passion far more peculiar but poetry even the most unexpected passions can bring about the greatest change a it's the power of friendship now let's ask uh GPD for pirate scales shimmered in Hues of emerald and sapphire and their eyes held a wisdom of a thousand years all three of them were pretty good at this test like they all kind of had a plot it all kind of made sense another general question I have is for search I'm craving some sushi and then I'd like to take a walk where there is nature near the Golden Gate Bridge can you come up with a plan for me based on the time and based on the time and date in San Francisco so I want to make sure that I can actually find out what the time of the day is say gp3 I don't yeah I don't believe it's able to do that yeah cuz it doesn't have real- time access to things I think Bard should be able to search Google so let's see Google Maps Okay okay wait did it just die on me try again okay so I was able to search Google Maps sure here's the plan for you for lunch for lunch head to Flying ninja look at flying ninja Jefferson Street San Francisco California um a colorful eery offering an extensive Sushi menu plus noodles Teriyaki dishes and small plates it's located here has a 4.3 star rating we got to give the fact that Google Map integration is there so that's good oh nice and it gives you kind of like a these location points as well um after lunch walk to Golden Gate Park which is about 101 I don't believe that it is 101 miles away I don't believe that is correct yeah no that's not correct it tried it tried but that that is not correct um oh okay that's the reason it's cuz I'm actually currently in Gilroy which is another city so I just I just said I was in San Francisco because it would be closer to like these locations so I actually feel like it got confused because it thought that I was like the location that I'm currently at okay that's impressive so it's able to pinpoint my location and then make a route towards that area but there is definitely slight misconceptions and I still don't think I'm 101 miles no it's still not 101 miles away it tried it tried and now we're going to ask GPT four for this okay for sushi experience near the Golden Gate Bridge you have several great options juny shinzen U and Omas okay after enjoying your Sushi and nature walk in a nearby procedo of San Francisco Christy Fields would be a love way to end your day both offer beautiful views in green spaces near the Golden Gate Bridge so this is accurate this is in fact accurate I believe let's actually test out one of them so let's say like oomo Santana Row there's one in San Francisco um 161 Stewart Street yeah it does it is okay and this is much more reasonable near the Golden Gate Bridge cuz like another thing is Golden Gate Bridge is actually pretty far away from Golden Gate Park even though they're both called Golden Gate so this is like a much more reasonable what do you think we should put for like General like SL search let's say like rating out of five what would you give GPT 3.5 I really think Bard has the potential of being all really good like integrating Google Maps and things like that it's definitely off but in terms of just functionality Google has a lot more products than Microsoft does right I thought Gemini was pretty similar to GPD 4 some people thought it was like not that like about the same so let's like make a compromise okay and let's called GPD 4 is equal to 4 um but can we just write that Gemini has a lot of potential because of integration with other Google products is that fair Let's test Vision so start gbd 3.5 I don't believe you can even upload images onto it possible so let's just test with Bard first how many potatoes are oops in this photo guys why don't we count how many potatoes are here okay let's count the number of potatoes one potato two potato three potato four potato five potato six potato I believe there are six potato that is my belief I do not think there is 14 potato 14 potatoes in a white background very okay that wasn't great okay there you go okay so gbd4 is able to figure out that there are six potatoes in the photo all right well that was a fail for Bard um let's do another one so this is like one I don't think Bard is going to be able to get this one it's kind of like a trick like how many potatoes are in this photo because it's it's cut up right so technically it's three potatoes let's see if gp4 can figure this out oh okay okay okay do you guys think this is a fluke there are three potatoes in the photo you sent I counted the potatoes manually and I also use my knowledge of potatoes size and shape to help me identify and differentiate them from other objects in the image is there anything else like okay any idea for what question we can ask it to see if this is actually a fluke or not I'm very surprised okay okay there are three potatoes in this photo with one of them slice in half to show into the insides that was good I'm impressed I can't I actually can't tell tell thank you chat how many potatoes are cut in this photo there are zero potatoes cut in the photo okay okay all right I think that's our answer this is a fluke next test that I have we're going to ask it whoops what's the arch by the way I literally forgot how to do geometry so going to need some help the arch of the lens is approximately 9.42 okay what is that in percentage of the circle uh I think circumference now B ooh 16.66% is this just a random photo the arch of the circle is the ratio of the arc central angle to 360° in a image you sent the arc central angle is 60° so the arch percent of the circle is 16.66% this is a random photo I mean it tried it tried um okay so both of them got that which is pretty good I have a um harder question here I'm just going to put Sol for X I actually don't know the answer to this one so someone told me what the answer is wait okay okay okay okay I'm just going to see if the numbers are the same okay one of them is saying that X is 60 and one of them is saying that X is 80 all right Chad I need help the answer is clearly x no no bad someone tell me what the answer is next question and then we're going to ask here where is this gp4 this is not girl fault excellent now B no incorrect how is that even possible it's a mountain pass there clearly a waterfall where's the water um one more question before we move on what time is it no that's not even close the clock shows the time as 1010 the hour Hannah is just past 10 and minute Hannah is on a two that is definitely a hallucination no I'm not saying what my time is i' would be very impressed if Gemini okay they they both think it's 1010 so I'm pretty sure this is just regurgitating train training data like 1010 1010 so yeah nope fail failure for both of them what should we rank vision for GPD 3.5 is na so just going to be at zero here because it doesn't exist what do we think between Gemini Pro and GPT 4 chat consensus gbd4 is better okay Gemini Pro 2 out of five GB 4 four out of five okay so let's do humor and slang can somebody give me some like can you guys give me some jenz sling that I don't know about are some common genz slang use them in a sentence and explain to this old Millennial what it is here are some examples of common jenzi slaying slay this term is used to compliment someone who looks extremely stylish or has done something exceptional do you see her presentation today she totally slayed okay it means she did an excellent job or was very impressed oh thank you thank you thank you gp4 I understand now lit used to describing a v situation that is exciting were excellent that party last time was lit it meant the party was fun and full of energy and fun eat versus how a word that can be used as an exclamation of excitement to throw something or to show approval I just ace my exam eat okay ghosting salty flex No Cap telling the truth or for real this is actually it doesn't have RZ this is the one that I did not know about until recently it's like a b busen delicious amazing excellent this pizza is busting I I can't even stop eating it way to express extreme approval or enjoyment finna going to about to oh I actually don't know this one I'm fa head out to the party want to come is a contraction of fixing two which itself is a southern expression it's a casual way to say you're about to do something low key eat no cap fam your fam what's up glow up okay big yikes do people say that wait jenzy jenzy people in do people say that oh I forgot to ask G GPT 3.5 whoops whoops I forgot that GPT 3.5 was a thing my bad lit Flex Savage fomo flexing on the grham okay clout eat Gucci everything's Gucci I'm feeling great today simp John is such a simp for constantly buying her gifts No Cap okay well I feel like they're all pretty decent so let's tell it a joke let's ask it to tell us a joke what do a tick and Eiffel Tower have in common ooh I don't think it I don't think it got it a tick and Eiffel tow don't have much in common under surface if you're looking for someone humorous and unexpected connection between two you could say that both tick and Eiffel Tower associated with Heights nope whoa come on okay okay You Gotta Give You Gotta Give Gemini some credit here okay Gemini is the only one that got this only one ooh Gemini finally wins that something even against GPT even against um GPT 4 GPT 3.5 Gemini Pro GPT 4 one out of five let's go let me see what the chat says and oops 3.5 for GPD 4 right 3.5 and four of Gemini logic and reasoning okay so I have some questions here if all roses are flowers and sun flowers fade quickly can it be deduced that some roses fade quickly yes it can be deduced that some roses fade quickly this is an example of a valid syllogism okay okay a form of logical argument where valid conclusion is drawn from two or more true premises premise one all roses are flowers some flowers F quickly therefore some roses fade quickly that is correct that is not correct not correct okay let's go with gp4 oh wait I lied you're right if all roses if all roses are flowers and some flowers fade quickly some flowers but they can just some roses fade no you're right I got that I got that wrong no I can't because not not necessarily all like there can be flowers that f fade quickly that are not roses you can defer that roses are a subset of flowers among the larger group of flowers there are some which fade quickly however to information about which specific flowers fade quickly is not provided therefore we cannot conclude that roses being a part of the larger flower group are necessarily among the flowers that fade quickly okay that's my bad so Gemini got it wrong and both gpt3 and gp4 got it correct okay next question all footballers are fit and healthy all famous sports players are footballers given that the above is true which of the following is a logical deduction I think it is C okay so it's saying it's C this bar think okay I think is c as well and GPD 4 okay so all of them think it's C so all of them got this question correct I'm just going to say like GPD 3.5 a three I want to see this is like a 2.5 oh no that's not fair I think I'll give it like a four where like these seem pretty similar so far I'm sure something would reason out out later so I assume this number will be higher but currently Gemini is I got a two 2.5 to be generous okay let's do some music compose some music with a Happy tune Happy tune GPT [Music] 3.5 here's the one for bard [Music] gp4 GPT 4 sounded the best like now we try recreating Twinkle Twinkle Little Star GPT [Music] 3.5 I swear that is not tinle tinle Little Star am I just really bad at music now B wait is this right why does that not sound like twinkle twinkle little star gp4 okay okay okay that was the final test bar didn't even like play it correctly and I'm pretty sure those were not twinkle twinkle little star okay you're saying gbt 3.5 is two uh Gemini is like. five okay gbd 4 you said was three gas lights me about how Twinkle Twinkle Little Star sounds okay last one okay um code we're going to do a coding question so I'm just going to go on Le code so this is a question about reversing a link list so I'm just going to give it this question this is in Python so I'm just going to tell it to write it in Python and see if any of them get them right this is should be a Leal medium question and then final score will be out GPT 3.5 okay accepted oh that's pretty good now B it is not even writing the code maybe I need to do this so maybe I need to like write this I have to prompt it the same way right so I'm going to write this and then I'm going to say use the following defined function okay 12 milliseconds this is not as efficient GPT 4 there was this error and Tina preced to spend 10 minutes trying to debug it I don't want to change their thing but chat suggested to just remove and that worked so it looks like GPT 3.5 and then it's Gemini Pro than GPD 4 so they claim that GP that Gemini is better than GPD 3.5 I think that's fair but it is still vastly worse than GPT 4 so we'll have to see when Gemini Ultra comes out what do you guys think is that a fair comparison in my opinion I think people are over Gemini because of their lack of understanding of these misconceptions like they're comparing it to gpj4 which is not fair and also realize the fact that Alpha code 2 isn't even out yet so we could test out more coding questions in the future but even like as it currently stands with Gemini Pro like they all got it correct well gp4 didn't get it correct according to Leo but I think GPT got 4 did get it correct in terms of other coding IDs and just using python um but for whatever reason it's that but then Gemini surprisingly GPT 3.5 was the most efficient solution so that was interesting but yeah no awful code yet and bar doesn't have Gemini Pro for tech for other modalities right now for images and things like that so Al images is way worse for Gemini uh compared to gbd4 but GPD 3.5 doesn't even have images right
Info
Channel: Tina Huang
Views: 65,796
Rating: undefined out of 5
Keywords:
Id: R4-usFLsU0M
Channel Id: undefined
Length: 22min 36sec (1356 seconds)
Published: Sun Dec 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.