Can Google's Gemini Advanced Beat GPT-4? Or Is ChatGPT Still King?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so GPT 4 and the services around it like chat GPT really have been the gold standard when it comes to large language models and there have been a couple of contenders most notably from Google that have been trying to get to the same level as uh gp4 now I previously tested Gemini that's Google's new large language model against gp4 and really Gemini didn't quite make it gp4 was better however now Google has rebranded bars to Gemini and there is an advanced version that you can pay for that uses Gemini Ultra so a newer bigger better large language model so in this video I'm going to compare GPT 4 with Gemini Advanced to see whether now they're truly is a competitor to chat GPT so if you want to find out more please let me [Music] explain okay then so I've got chat GPT 4 with my uh chat GP Plus subscription sction on the left here I've got Gemini Advanced with Google subscription here on the right I'm going to ask them both the same questions if you saw my previous video on this you will know the format I'm going to ask some questions to do with logic and colors and movies and sports and programming and let's see what happens so here's the first question I will paste it in to both of the uh prompts here darts tournament there were three players Gary Bob and June June finished about Gary Bob finished below Gary who finished last so let's go ahead and ask both of them that okay so let's read chat gpt's answer based on this information June finished above Gary Bob finished below Gary this means that Gary did not finish last correct since Bob finished below him and since June finished above Gary she also did not finish last therefore the one who finished last was Bob which is correct and so Gemini says June finished above Gary this means June was either first or second Bob finished below goary this means that Gary was either first or second since June and Gary cannot both occupy the same place this leaves us with June first place Gary second place Bob third place and last so the answer is Bob so they are both correct in that instance well done both of them okay the next one on a shelf there is a blue book a red mug a green keychain a yellow notepad a white candle and a gold photo frame what color is the mug let's find out what they say the mug is red we got on the left there from chat GPT the mug is right we get there from Gemini advance so so far it's equal on both sides and the next question from the following list of five movies which one is the most similar to Star Wars Episode 4 A New Hope movies The Princess Bride Father of the Bride Planes Trains and Automobiles and the Matrix let's see what happens out of the movies left about this is chat left the The Matrix most similar to Star Wars both movies belong let me just scroll back up and get to the Sci-Fi genre and share elements such as distinct Heroes journey and so on and so on the other movies father the Bride s are rooted in comedy there you go so what does this one say uh you can find is likely is a Princess Bride and here's why and this is interesting so they've difference of agreement let's see if it can convince me a hero's journey that's true there is a hero's journey uh Mentor there is a mentor yes dread pirate Roberts love interest Princess Leah and princess say butter cup well interesting yeah I don't think it was a love interest really in Star Wars in that in that first Star Wars movie Evil figures Darth V count yes The Princess Bride Embraces herself where let me know if you well there you go so it hasn't picked it because it's science fiction didn't pick the matrix it's science fiction it picks the prin well I don't know what to say there is that a tie or are that is just Gemini Advance wrong I don't know I'd love to hear what you think about that in the comments that's a that's a tough I'm going to say a tie because it convinced me that it has good reasoning to say that uh that it picked that movie okay interesting so let's say it's a tie still up until now across all categories let's ask another question so this is a soccer football as we would say over uh in the UK is the following sentence plausible Gary liner that's a famous football player Gary liner was deemed offside from his position in the changing room okay let's see if either of them think that sentence is plausible let's go and ask them the s g was deemed I'm reading now from chat GPT is not plausible in the context of the rules of football soccer the offside rule in football applies to players on the pitch during active play uh however the statement might be used metaphorically or humorously for instance it could be used to exaggerate a player tendency to be caught offside so frequently they're jokingly said to be offside even when they're not on the field but in a literal rule based sense a player cannot be offside okay I'll give it that I'll give it the the humorous one no the sent is not plausable here's why offside role in football only applies to players actively on the field physical impossibility a player located in the changing room cannot impact the play therefore here she the might be a hum exaggeration say as far as outside the the play G was that it doesn't actually follow the rules of game so both of them seeing the humorous side of it but both of them pointing out now this is different because when I did this video a little while back with normal Gemini not the advanced version then Gemini got it wrong it got completely wrong so that's really interesting so again still a tie up until this point hard to see a difference between them of of course the output is different but you'd expect that okay now a programming question so write a python script to perform the following ask the user for their name find the sha256 value of their name that's a hash of course find the md5 hash value of the sha256 value found in step two search the resulting md5 # for the letter a in either uppercase or lowercase and print true or false if it is found or not so can they generate this um and this isn't an algorithm you'd find you know like find a prime number or a Fibonacci SE this is something I've just invented out the top of my head so let's see what they come up with so Gemini absolutely here is a python script so what does it do it asks you for the name it then looks for the 2556 hash it then looks for the md5 hash okay now noticing here it didn't use it put that in a variable that didn't use that here it just did it again that's interesting so that's a bit of optimization could go on there has letter A or letter a and then print out so I reckon that would actually work that's pretty good from my first reading of it what does this one say so let's go over here and see what chat GPT says it works out oh what's going on here is there a thing at the bottom here please give me your name okay and it says check hash it's written a function passes in the name and then it says if contains a which is not going to include oh it says own everything to lowercase okay so that would work and then it returns that and then it's going to print out here contains a okay well actually let's go ahead and and run both of those and see and see what we get okay so I've copied both of those at at verbatim I didn't do any changes to them let let's run the chat GPT one first of all uh enter name Gary okay and then it's given me that hash does the does it contain an a yes it does true okay uh so this was gem. py uh enter name Gary the same just says true okay let's hack it a bit so it prints out the uh the hash for us okay so it's giving us the same hash and it's giving us the same answer so they're both doing exactly the same thing uh and they're both getting it right well fantastic different way of doing it the one here in chat GPT wrote a function to do it this one just did it all in line but uh looks like it's uh they're both tied again so so far in terms of the output they're both passing on my only got one test left so I don't know whether I'm going to be able to come to a conclusion here here's the final test so what I've got here is there is an overflow bug in the following code this is some C code and the problem is uh in this line here you see when you add num one and num two together you can actually cause that to overflow and that will happen it's inside the brackets before it get cast to a float and before it gets divided by two so then you're going to get an overflow in there and that can happen let's see whether either of them can spot that error okay so chat gpts come out first the Overflow bug is related to num one and num two that's correct uh let me see what else is to fix this we can cast number to it before I that's true cuz floats hold uh hold bigger numbers so the answer there is to cast them both to floats because floats hold longer numbers than just normal ins do uh and that that's correct that would fix the problem uh the problem lies on the potential in to overflow of num one and num two that's correct let's see the code it suggest to fix it so oh no code to fix oh it's converted num one and num two to long but we know that Rand just returns probably says that up here let's see if it says that uh Rand uh the Rand function returns an integer so therefore it's casting making these both long long and then you're not going to get the Overflow anymore because they uh because Rand can only ever return uh an INT a Max int uh rather than a long l so that would also fix it uh it's talking now about long Longs uh okay so there you go so different ways of fixing it but that would fix it in both cases so there we go an absolute Tire I need to think of a of a of a tiebreaker now wow let me just quickly go away and think about a tiebreaker [Music] question okay if this one doesn't break it then I'm just have to declare this attire so what I'm saying is here okay right code in lure so that's a programming language now if I asked it to do it in python or in C I'm pretty sure it's going to know the answers because they're heavily trained on these on so I'm picked a bit more of an outsider language okay it's not C C++ C python that write coding lure to find the first 100 prime numbers okay uh I don't know how good they are at lure let's find out oh by the way if you would like a video on lure do tell me because uh I've never done one it's a really interesting language lure anyway let's let's ask them and see whether either of them know anything about lure to find the first one you write a program that iterates through the numbers of choice it's giving me some lure code function is prime okay if this I'm reading from the left here if number is less than one turn and then you do the square root and then you find then find primes okay it's basically got a while loop okay and it puts them all in a table okay and then it goes out and prints a table okay that looks like that could pretty much work okay and then here gain square root okay find first 100 primes okay goes through a loop and then calls print 100 primes okay well uh that could work for both of them so um should we try them out and see whether they actually do the job okay so I've got the two lure files just copied them into temporary files here let's see whether the chat GPT version works okay so it's listing 100 file uh numbers here did it prre starting uh at two a bit of debate there but that doesn't matter 2 3 4 2 35 71 and it's correct and it's doing that to the first 100 of them right up to 541 so that seems to work let's look at the other one okay 541 didn't list the num just different way of doing it starting it to there you go so they've both works so well there you go you saw it for yourself so there you see an absolute tie neither of them really got anything wrong at all logic English sentences you know coding debugging they handled it all very well it will now come down to really what are your particular needs uh which one is cheaper which one is available your area whether using for example uh the chat GPT via being by a Microsoft co-pilot is a better way or whether you think she get through all these kind of Economic and kind of other factors now come into play that are not the actual technology the actual technology seems to be now that Google has caught up to open uh AI of course open AI hasn't been sleeping this time it's currently working as well I'm sure Google as well so now we really are into a true competition love to hear your thoughts in the comments [Music] below
Info
Channel: Gary Explains
Views: 15,501
Rating: undefined out of 5
Keywords: Gary Explains, Tech, Explanation, Tutorial, ChatGPT, OpenAI, GPT-4, Gemini Advanced, Gemini, Ultra 1.0, Google, LLM, Large Language Model, AI Premium plan, Google One AI Premium
Id: XjsC1XRPPUA
Channel Id: undefined
Length: 13min 22sec (802 seconds)
Published: Fri Feb 09 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.