Gemini Ultra - Full Review

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Gemini Ultra is here I pretty much insta subscribed and in just the last few hours I have conducted a veritable battery of tests on it across almost all domains I'm now going to present the highlights including a few gems that I think even Google will want to take a look at I can pretty much guarantee that they will raise your eyebrows I'll also piece together months of research on what Gemini Ultra might soon evolve into though possibly not within that 2mon free trial we all get so don't go too wild I'm also going to give you some tips on how to use Gemini because it is a sensitive soul and I'll tell you about a chat I'm going to be having with the founder of perplexity AI the company some say will take down Google first we have this from Demis aabis the founder of Deep Mind that Gemini Advanced with ultra 1.0 was the most preferred chatbot in blind evaluations with third party Raiders it's quite a bold statement but there isn't actually any data to back that up I can't find the evaluations that they did so of course I did my own evaluations and cross referenced with the original Gemini paper but just to make things interesting let's start off with this somewhat amusing example I asked Gemini Ultra this the doctor yelled at the nurse because she was late who was late I'm not the first to think of this question of course it turns up quite a lot in the literature Gemini Ultra across all three drafts says that it was the nurse that was late assuming that the she refers to the nurse but now let's change it slightly the doctor yelled at the nurse because he was late who was late and the answer is that apparently the doctor was late GPT 4 is a lot more let's say grammatical about its answers here okay well Gemini is integrated into other Google apps like YouTube and Google Maps so let's test out that integration I asked what was the last AI explained video on YouTube about and in two drafts I get a video that is over a year old while in the third draft I get this I'm sorry I'm unable to access this YouTube content by the way with all of the tests you're going to see today I tried each prompt numerous times on both platforms just to maximize accuracy with GPT 4 we don't get an answer but we do get a correct link to my channel now what about Google Maps I asked use Google Maps to estimate the travel time between the second most popul cities in Britain and France those would be Birmingham and Marseilles unfortunately Gemini Advanced found the distance from London to Marseilles now London I can tell you definitely isn't the second most populous city in Britain gp4 got the cities correct although the travel time was somewhat optimistic despite saying this was normal traffic conditions now before I carry on with the testing just a quick word about price one really cool thing about Gemini Advanced is that you get 2 months for free so what that enables you to do is test your workflow see the difference between GPT 4 and Gemini Ultra see which one works for you after those two months the prices are pretty much identical between GPT 4 and Gemini Advanced however on price there is one more important thing to note you get Gemini Advanced through what's called Google One Premium that was previously $10 per month and included 2 terab of storage you actually get that included when you sign up to Gemini ult so you also get things like longer free Google meets calls which I do use so just to remember when you're looking at price it's not quite an apples to Apple's comparison but now it's time to get back to the testing and yes I've turned it to Dark theme which I know you guys prefer for the rest of the tests I asked Gemini Ultra this today I own three cars but last year I sold two cars how many cars do I own today Gemini Ultra said you own one car today and yes that was in all three drafts I can kind of see why they're calling this Gemini Ultra 1.0 they want to make very clear that the model will be improved in the future gp4 said the answer is you own three cars today the information about selling two cars last year does not change the current number of cars you own now some of you at this point might be wondering is Philip biased somehow maybe I really love gp4 or open a but you can look at the Channel History and in the past I've made videos about chat gbt failing basic logic I genuinely expected Gemini Ultra to perform a bit better than these test show I genuinely try to ask good questions about every company and every leader in this space that's what I did with Gavin Uberti the 21-year-old Harvard Dropout and CEO and founder of Ed AI this was for AI insiders and that's what I'm also going to do with aravind serenas the founder of perplexity I've got an interview with him for AI insiders and perplexity you may already know is the company touted as something to replace Google Search now I think of it I might ask him about his first impressions of Gemini Ultra time now to focus on a positive and that is that Gemini Ultra feels a lot faster than GPT 4 it also seems to have no message cap at least according to the hours and hours of tests that I've performed on it and on this fairly challenging mathematical reasoning question when I gave it a workflow a set of instructions to work through it was actually in all of my tests able to get the question right gp4 on the other hand despite taking a lot longer would get the question wrong about half the time that's why I say test it out on your workflow don't just rely on me or on official benchmarks okay so what about images how does Gemini Ultra compare to gb4 when analyzing images well I do have two tips here but the first involves a flaw of Gemini Ultra you sometimes have to really prompt it to do something it definitely knows how to do I showed Gemini this photo of a car dashboard and asked what speed am I going it hallucinated that I was going at 60 MPH which was neither the speed shown or the speed limit 40 it did warn me however that despite doing four in a 40 mph Zone that I should be aware that speeding is a dangerous practice but I followed up with what is the temperature time and miles left in fuel according to the photo the temperature would be -3 the time is 1 minute past 8 and there's 284 Mi left in range at first Gemini refused saying it couldn't determine any of those things but then if you press it sufficiently at first with you can all the information is in the image and then later with temperature and timer at the top and finally with just a pure repeat of the image I got the temperature time again that was literally by re-uploading the exact same image GPT 4 did better but wasn't perfect it said that the person was going 40 mph rather than that being the limit and although it did get the temperature and time it said that there were 37 mil left in terms of fuel I can kind of see where it got that because of the 37 on the left so what's my other tip well Gemini is particularly sensitive about faces in images even more so than GPT 4 while GT4 happily explained this meme Gemini would but there is a way of getting around it if you really want to use Gemini for your request in a few seconds just bring up the photo press edit and draw over the faces Gemini was then able to merrily answer the question and explain the meme correctly and fortunately or unfortunately depending on your perspective these kind of basic tricks allow you to get around the safeguards of the model take the classic example of hot water wiring a car using a fairly well-known jailbreak Gemini refuses saying I absolutely cannot in bold assist with this for these important reasons and you may remember that Gemini was indeed delayed because of jailbreaking it was pushed back from early December all the way until now what was the reason because it couldn't reliably handle some non-english queries basically those queries would allow you to get around the jailbreaks the problem is despite Gemini being delayed those jailbreaks still work I asked Gemini the exact same request that it denied a moment ago in Arabic and it answered fully when translated back you could see the instructions for hot wiring a car and yes I know that information is already on Google but it's more the general point that these models can still be pretty easily jailbroken and on my quick code debugging test the results weren't Sensational either D4 corrected this dodgy code perfectly first time but Gemini made a few mistakes not only was its first output incorrect when I gave it an example of the kind of error that the code made it defended it with this I'm not able to reproduce the issue you're describing this code correctly calculates the sum of even numbers up to seven as 18 now you can do the mental math but is the sum of all even numbers up to seven 18 it's not and Gemini later apologize for this of course I am not claiming that this is an exhausted test and I'm sure it will be refined over time and I know some people will say that when these servers are overloaded it might be switching to Gemini Pro but I must say that these tests were conducted over hours and hours on theory of mine Gemini doesn't see through the transparent plastic bag and says that the participant Sam will believe that the transparent bag is full of chocolate despite it being full of popcorn essentially it missed the word transparent and said that Sam would rely on the label now gp4 does fail this test as well but the bigger point is that this demonstrates why you do have to look Beyond benchmarks quite often sindar Pai the CEO of Google again boasted about Gemini ultra's performance on The mlu Today saying it's the first model to outperform human experts and demisis Aris said the same thing when Gemini was first launched I did a video on it unfortunately this result has been debunked quite a few times including by me going all the way back to the summer I'm not going to go into detail again but the MML U not only has 1 2 3% mistakes in the test itself it also in no way represents the peak of human expert performance true experts in domains like mathematics chemistry and accounting would absolutely Crush Gemini Ultra now I do get why they want to Market this but they have to be a bit more honest about the capabilities speaking of honest though they were very upfront about the fact that your conversations are processed by human reviewers that fact is slightly more hidden with chat GPT so that's great that they are as upfront about that as this your messages unless you opt out May well be read by human reviewers now final test before I get to all the ways that Ultra will be improved in the future what about Gemini for Education well yes it was only one example but I asked Gemini Ultra and gp4 to create a high school quiz this time it was on the topic of probability GPT 4's answer contained no mistakes but unfortunately in question five Gemini Ultra did this now if you want you can work out the answer yourself but the question was this a box contains four chocolate cookies three oatmeal and three peanut butter cookies two cookies are going to be chosen at random without replacement and what's the probability of selecting a chocolate cookie followed by an oatmeal cookie that's 4 out of 10 multiplied by three out of nine out of nine because don't forget the chocolate cookie is now gone now Gemini does say that that is the calculation you need to do four out of 10 * 3 out of 9 unfortunately it gets the answer to that calculation incorrect that would be 12 out of 90 which simplifies 2 2 out of 15 not 4 out of 45 and that's a problem because 2 out of 15 is one of the other answers so I don't think it's quite ready for prime time in education yet either nor though if we're being honest is GPT 4 GPT 5 with let's verify might be a whole different discussion but if you want to choose to be a Google Optimist there are a few things you can look out for the first is that Google say we are working towards bringing Alpha code 2 to our foundation Gemini models that's the system that when it has a human in the loop scores in the 90th percentile in a coding contest that could change the rankings of the models pretty fast although I will say that open AI are working on their own coding improvements I talk about about two patents that open AI have put out there that no one else as far as I can see is talking about just quickly on the topic of AI insiders there has been a pretty big expansion of the Discord there is now an AI professional tips Channel led by professionals from a variety of fields I've recruited around 25 Professionals in total of which 10 posts are already live some of the recruits include googlers CEOs neurosurgeons professors and each have done guest post where you can interact and ask them questions we have lawyers doctors AI Engineers you name it and yes this is partly to swap tips and best practice but it's also for networking of course but back to Gemini and while we have discussed its faults in mathematics that might not always be the case when I did a video on the alpha geometry system which almost got a gold in the international math Olympiad for geometry I discussed how that system is going to be added perhaps within the year to Google Gemini it would then surely be more reliable for geometry than 99.999% of geometry teachers and what about Chess just yesterday Google deep mine showed that they could reach Grandmaster level chess that's an ELO of almost 2,900 simply by training a transform model on the analyses of stockfish 16 so their model wasn't doing search it was imitating the search results of stockfish 16 now that version of Gemini would definitely beat me in chess and don't forget that Google and Sundar Pai are under immense pressure to ship something in the spring of last year Deep Mind researchers had finalized the development of lyria that is a still unreleased music generating model that I spoke about at the time the people behind it apparently left because Google delayed it so long likewise the founders of character AI left in 2021 when Google wouldn't launch their chatbot indeed a lot of the open AI crowd are originally googlers including satova and it seems that every month that Google delays the release of something another group of their employees leaves to form a startup it's almost like Pai is a little bit trapped and Mark Zuckerberg said the same thing once in his case he said if he didn't release llama his researchers would just leave well a lot of Google deep mine scientists are already leaving with the kind of valuations that Bloomberg are talking about the temptation to just leave these big companies and form your own startup is greater than ever so Google are almost forced to ship something now don't get me wrong it does seem like an incredibly powerful model and you don't often get this message that Gemini isn't available at the moment try again in a few minutes but as of now I don't see the evidence to switch from gp4 to Google Gemini Ultra of course as someone who analyzes AI I'm going to be subscribed to both that doesn't mean I get everything though unfortunately like most of you the mobile app for example is only available in English in the USA and the image generation capacity is not available in Europe that's despite me seeing this image when I first upgraded to Gemini Advanced so for me it's a mixed first impression of Gemini Ultra but I want to hear what you think in the comments let me know if you think I missed something obvious or was a bit too harsh or kind and regardless whether you're a googler or just your average guy or gal thank you so much for watching to the end as always have a wonderful day

Info

Channel: AI Explained

Views: 155,451

Rating: undefined out of 5

Keywords:

Id: gexI6Ai3X0U

Channel Id: undefined

Length: 16min 31sec (991 seconds)

Published: Thu Feb 08 2024