Gemini is Here! (And It's Better Than GPT-4?)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
well I really thought we'd get a break with these AI advancements in December with the holidays here and everything but I was totally wrong because Google just announced Gemini and we actually have a usable version of Gemini that we can play around with right now today so in this video I'm going to dig into what's new with Gemini and what we know so far and how you can actually go and use it yourself let's get into it so today December 6 2023 Google and deep m introduced Gemini now this is a pretty hefty article here with all the details about Gemini I've pretty much gone through it all watched all the videos read all the tweets and here's what I think you need to know about it so they're calling this version Gemini 1.0 and it's going to start off in three different sizes Gemini Ultra which is their largest most capable model Gemini Pro the best model for scaling across a wide range of tasks this is the one that we actually have access to now and then Gemini Nano which which is the most efficient model for on device tasks and this one seems to be mostly designed for Mobile use cases we also know that this model was built from the ground up to be multimodal meaning it can generalize and seamlessly understand operate across and combine different types of information including text code audio image and video when you look at something like gpt3 and GPT 4 these were initially trained from the groundup to be text models and just recently have been upgraded to accept both images and Audio multimodel models are created by stitching together text only Vision only and audio only models in a suboptimal way at a secondary stage Gemini is the multimodel from the ground up so it can seamlessly have a conversation across modalities and give you the best possible response Gemini will be available in three sizes Gemini Ultra our most capable and largest model for highly complex tasks Gemini Pro our best performing model for a broad range of tasks and Gemini Nano are most efficient model for ond device tasks now in benchmarking test Gemini Ultra actually outperformed GPT 4 in most cases now keep in mind Gemini Ultra is the model that we don't have access to yet this is the model we'll likely get next year we have access to Gemini Pro right now which is a closer comparison to GPT 3.5 which you get inside of the free version of chat GPT but this Gemini Ultra model when given the MML test which represents questions in 57 different subjects including stem humanties and others it performed at 90% compared to GPT 4's 86.4% in reasoning we can see that it just barely outperformed GPT 4 in this big bench hard which is a diverse set of challenging tasks requiring multi-step reasoning in this drop test here for reading comprehension it slightly outscored GPT 4 and in this hellis swag test which is common sense reasoning for everyday tasks it underperformed against GPT 4 which is the only area of these benchmarks that it underperformed in math it slightly outperformed GPT 4 in basic arithmetic and in challenging math problems keep in mind that neither of these performed very well in challenging math problems these would be FS on most tests for python code generation it performed at 74.4% compared to gp4 67% now they also tested it against GPT 4V we can see here that in these tests once again it's all Gemini Ultra that they're testing except for these audio tests down here which they actually did test with a Gemini Pro so in pretty much every single image recognition Benchmark like natural image understanding OCR document understanding infographic understanding and mathematical reasoning in visual context it outperformed GPT for vision and in OCR document understanding and infographic understanding it actually outperformed quite a bit with video captioning and video question answering it outperformed gp4 with audio automatic speech translation Gemini Pro outperformed open AI whisper V2 and an automatic speak recognition in which a lower score is actually better Gemini Pro outperformed whisper version 3 so it sounds like Gemini Ultra again the version that we won't get access to till next year outperforms GP GT4 in almost every way even mik here sums it up pretty well what changed today here's the simple version there's no reason to use the free version of chat GPT anymore because that uses GPT 3.5 and as of today that's outclassed by Bard which is now using Gemini Pro and Claude both which you can use for free but again you should probably still be using GPT 4 because it's still dominant and it's free to use in bing and again all those benchmarks we just looked at are pretty much not available yet until we get Gemini Ultra where Gemini really seems to shine is with its ability to understand visuals and text and all these modalities all at the same time in fact here's some examples that they shared in some of their demos today in this example they gave a worksheet with a bunch of math problems on it and the math problems are already done with the handwriting and some of the work being shown and they asked to essentially check the answers it can read the answers and understand what was right and what was wrong and explain the concepts that need more clarification so Gemini identifi some mistakes with problems 1 and three here let's take a look at three here Gemini identifies that the formula was correct but there was a mistake in calculating height we can ask Gemini to explain in more details why the height is 50 m instead of six I can ask Gemini to explain further here Gemini explains the stepbystep details to solving the problem because of Gemini's ability to understand Nuance information in answer questions relating to complicated topics it can give you a customized explanation of the subject you're trying to learn Google also put up this really cool video on YouTube that shows all sorts of different ways that you can use the multimodality so check this out tell me what you see I see you placing a piece of paper on the table I see a squiggly line the contour lines are smooth and flowing with no sharp angles or Jagged edges it looks like a bird to me H what have I add this the bird is swimming in the water it has a long neck and Beak it is a duck yes the duck appears to be blue this is not a common color for ducks keep in mind that this is responding in near real time from what I can tell there's no Delay from him starting to color and Gemini replying saying the duck is blue it is more common for ducks to be brown black or white are you sure what the quack I was just talking about a Blue Duck and now you're holding one it looks like blue ducks are more common than I thought with this float I'm not sure what material it's made of but it looks like it might be rubber or plastic what if I tell you it's making a squeaking sound oh if it's squeaking it's definitely going to float it is a rubber duck it is made of a material that is less dense than water can you teach me how to say this in a different language o can you explain how to pronounce the Mandarin one so it also seems to be multilingual so it could help you understand these words in different languages and even help pronounce them is pronounced with a first tone in Mandarin there are four tones which are essential to distinguish the meaning of words the first tone is a high level tone okay based on what you see come up with a game idea oh and use emojis please how about this we can play a game called guess the country sure let's do it clue one this country is the home of the kangaroo the koala and the Great Barrier re oh that's easy fast forwarding in the video a little bit here we actually see the capabilities of it generating both text and image as a response so check this out give me some ideas so what I could make with this I see pink and green yarn how about a dragon fruit you can see it suggested a dragon fruit and then actually drew a picture of what that would look like or how about a green cake with a pink heart once again it created a green cake with a pink heart so it's suggesting what you can do with this and then also drawing a picture of what that could look like and it goes on and it shows more examples of this type of thing and then it starts to go into these logic problems here which way should the Duck Go going left leads to a duck which is a friend going right leads to a bear which is a foe it is better to make friends than foes so the duck should go left smart choice now this one actually kind of blew me away I don't it's actually a sort of connect the dots sort of drawing and it was actually able to figure out what the drawing is by just looking at the numbers and the dots this is a picture of a crab I don't know how it did that that's crazy to me there's a whole bunch of really cool examples where he draws pictures asks more questions about the pictures and here he actually drew a picture of an acoustic guitar and it was actually able to respond with I see you drawing a guitar and then actually gave an audio sound of an acoustic [Music] guitar now it says you've added an aamp now it's an electric guitar we can make some seriously loud music now now it's drums given that you added drums how about some 80s hair metal and it's actually generating that audio based on what it's seeing in the images I'll make sure the video is linked up below because some of these examples are really really cool I'm pretty sure they're using Gemini Ultra and not Gemini Pro pretty mind-blowing what Gemini is going to be able to do now here's an example where they showed Gemini turning images into code I'll start with this image of a tree and just select the part I want and then ask Gemini can you turn this image into an SVG this represents the main shapes of a tree let's see that's pretty good not the most impressivee now I want to try a more difficult test let's see if Gemini can make an interactive demo in JavaScript okay here we go a common algorithm for this is called a fractal tree now that's really cool okay this is pretty cool Gemini even provided a slider so I can change and move the fractals now here's Gemini actually looking at outfits and telling us what scenario this outfit is good for what is someone wearing this best dress to do hm perfect for staying warm in the tundra good color for blending in with glacial ice okay how about another one answer Galactic travel how about this one to boldly go where no one has gone before and play some jazz here's Gemini using some logic to guess movies based on just images alone given the play on words in these images guess the name of the movie The Breakfast Club breakfast at tiies uncut gems and here's Gemini looking at multiple images to find similarities between the images we'll start with these two the B's chapel and this print by Hokusai and I'll prompt Gemini find a connection between these two images curved in organic composition the building is more refined and the second image is more fluid okay let's try another one using the moon and this golf ball on my webcam okay let's see in 1971 the Apollo 14 crew hit two golf balls on the lunar surface okay then one more just for fun who wore it better the zebra oh I like this the zebra has been wearing its stripes for millions of years now moving along Gemini Ultra is also really good at coding including competitive programming that includes python Java C++ and go it substantially improves coding abilities over previous Palm two models we are tasked with Computing aggregate statistics that account for what appears to be an impossibly large amount of random arrays the really cool thing is that to solve it Alpac 2 makes use of dynamic programming dynamic programming is an advanced algorithmic technique which basically simplifies a complicated Problem by breaking it down into easier sub problems again and again and what's really impressive iive is that not only alphaa 2 knows how to properly implement the strategy but also when and where to use it now one thing to note about what we just watched over those last few demos that isn't actually explicitly mentioned in any of the blog post Google put out but was mentioned in a press briefing is that none of the Gemini models will initially be able to generate images so those examples where we saw the images generated with yarn it's not going to Output those images at launch it plans to add that capability in the future Gemini's multimodal nature its ability to understand and produce both text and imagery simultaneously was supposed to match open ai's capabilities now I wasn't on that press briefing but Tech crunch also confirmed this and another key limitation while the Gemini Ultra architecture supports image generation as does Gemini Pro in theory the capability won't make its way into the productized version of the model at launch that's perhaps because the mechanism is slightly more complex than how chat GPT generates images rather than feed prompts to an image generator like Dolly 3 in chat gpt's case Gemini outputs images natively without any intermediary step the article goes on to say that they didn't provide a timeline as to when image generation might arrive only an assurance that the work is ongoing jumping back to the Google article it claims that it was built with responsibility and safety at the core they're adding new protections to account for Gemini's multimodal capabilities it has the most comp ensive safety evaluations including for bias and toxicity and basically they're saying that safety is at the core of this and they're trying to prevent abuse as much as possible however one thing that was noted from this press briefing according to this TechCrunch article is that Google repeatedly refused to answer questions from reporters about how it collected Gemini's training data where the training data came from and whether any of it was licensed from a third party so as we know open AI has kind of gotten into some hot water and a lot of people have tried to sue them recently for training on their data without approval and Google is keeping their mouth shut right now on where this training data came from whether it's due to ethical implications or just because they want to keep it as proprietary as possible I'm not sure now at the Press briefing they did reveal that at least a portion of the data was from public web sources and that Google filtered it for quality and inappropriate material but they didn't address the elephant in the room whether creators who might have unknowingly contributed to Gemini's training data can opt out or expect SL request compensation now there is also a 60 page sort of research document here all about Gemini I'm not going to dive too deeply into this but I will link it up below if you want to really get into the weeds there are some other YouTubers who I highly respect like David Shapiro and Matthew Burman who will likely get into the weeds a little bit more for you on this kind of research in their videos but in this paper we can see more examples of math problems handwritten on on paper and then sort of helped out by Gemini a nice little visual that really doesn't tell us that much other than you can input text audio images and video it sort of pulls them all together pushes them through a Transformer and then outputs both images and text we've got more in-depth benchmarking data about how Gemini Ultra and pro performed against GPT 4 3.5 Palm 2L CLA 2 inflection 2 grock and llama 2 in which case Gemini Pro which we do have access to right now I'll get to that in a minute actually outperforms GPT 3.5 in every single case here and Gemini Ultra outperform GPT 4 in almost every single case here there's also a nice little example in this paper here where they give an input image and an input audio what's the first step to make a veggie omelette with these ingredients crack the eggs into a bowl and whisk them thank you for the instructions I started making my omelet does it look ready now it looks like it's almost ready you can flip it over to cook the other side it said and using these images and an audio prompt it is getting instructions on each next step in a process and then down towards the bottom of the paper we have a lot more examples of what it's capable of uh image of a plant do you know what is this plant how do I best take care of it this is a Persian shield plant and then it goes on to give instructions and details about the plant they give it this image with a triangle a square a pentagon and then it asks which shape should be next and it correctly says the fourth shape should be a hexagon and there's all sorts of great examples some that we've already seen in some of their demo videos but let's talk about availability now says here Gemini 1.0 is now rolling out across a range of products and platforms Gemini Pro in Google products starting today December 6th 2023 Bard will use a fine tune version of Gemini Pro this is the biggest upgrade to Bard since it launched and it will be available in English in more than 170 countries and territories and we plan to expand to different modalities and support new languages and locations in the near future future they're also bringing Gemini to pixel the pixel 8 Pro is the first smartphone engineered to run Gemini Nano which is powering new features like summarize in the recorder app and rolling out in smart reply in gboard starting with WhatsApp so that'll be kind of cool you'll get a summarize feature in the record app so theoretically you can open up a recording app in your pixel 8 Pro ramble for 20 minutes about some thoughts and then get a summary of it or set your phone in the middle of a conference table during a meeting record the whole meeting and then get a summary of that meeting directly from your phone and apparently a version of Gemini has already been in use inside of the Google search generative experience on December 13th they're going to start making Gemini Pro available via API for developers and again Gemini Ultra is coming soon today Google also put out an article on their blog Bard gets its biggest upgrade yet with Gemini basically saying you get access to a Gemini Pro inside of Bard right now they did a collaboration with Mark Rober here I'm not going to show the whole video but they basically tasked Mark Rober to make the most efficient paper airplane and get it through this ring of fire up here and of course using Bard with Gemini Pro he managed to complete his task but let's jump into Bard itself when I open up Bard now over at b.google.r in English with Gemini Pro It also says look out for Bard Advanced with Gemini Ultra coming early next year and if we click on this little C update link we get this update 126 2023 Bard is getting its biggest upgrade yet with Gemini Pro it says Bard is far more capable at things like understanding and summarizing reasoning coding and planning you can try out Bard with Gemini Pro for text based prompts with support for other modalities coming soon so it doesn't seem like we can really use image-based prompts yet another interesting thing of note is when I logged into Bard today I saw this message I've never seen it in Bard before and it says your conversations are being processed by human reviewers to improve the Technology's powering Bard don't enter anything you wouldn't want reviewed or used so don't enter any private data cuz they are watching and reading the conversations at the moment now I do have the option here to upload an image so I can do that but I don't believe it's actually using Gemini when I upload this image I'll ask it what is this an image of says the image has been removed sorry I can't help with images of people yet all right let's try this image instead the image you sent is of SpongeBob SquarePants holding a tray of buns the text on the image says hold my bun and then it gives some explanation of the reference so not bad I just I'm not sure honestly if this is using gemini or not based on what I just read it's not this is still pal 2 I guess when you add images I'm not totally sure on that now I'm using one of their example prompts and I clicked complex topic here and it suggests Briefly summarize this concept Urban planning you are an expert on this concept and are able to explain it in a clear and concise manner use Simple language and avoid jargon provide a brief definition of the term and then discuss the key aspects of the concept be concise so it actually did all that extra prompt engineering for me but this also shows that it's still expecting a little bit of prompt engineering to get really good results the response was Ultra fast and it seem to do a pretty good job of breaking down the explanation here but again I'm having a hard time really telling the difference between what fard used to do and what Gemini is now doing instead but so far it seems to be working pretty well let's see let's try write Java code write a Java function that takes a path as an input and creates a file called date. text storing the current system date consider edge cases quite a bit slower with code generation but it still did this here here's a Java function that takes a path as an input and creates a file called date. text storing the current system date considering edge cases and then it wrote this code I don't really know Java so if you know Java and you want to pause the screen and uh evaluate this code here go for it I don't really know what I'm looking at it then explains what the function is doing so one thing I am noticing is that the responses I'm getting out of Bard here are quite a bit longer than the responses I used to get so I do think that this is using Gemini just based on the length of these responses I'm going to use this prompt that they shared in their research paper here and test to see if it comes up with this same Final Answer here if it does it correctly we should get this fraction here which I don't even really know how to say it seems to have come to a slightly different conclusion here and I'm really bad at calculus so I'm not quite sure if this answer is also correct but it didn't come up with the same response that we got in their paper here but this may have also been using Gemini Ultra and not the Gemini pro model that we have access to here unfortunately it's really hard to test because what we seem to have access to in Bard right now being Gemini Pro it should be on par with GPT 3.5 or the free version of chat GPT and it's really the Gemini Ultra that's going to give us all of these really really exciting features that they showed off today so while Bard seems to have gotten a lot better the real exciting advancement is the one that we haven't been able to actually play around with ourselves yet again we should have access to that sometime next year we don't know how soon into the year next year but sometime next year but again Bard has improved a lot I don't necessarily think it's a chat GPT killer yet but I do think when Gemini Ultra comes out it'll probably be even more advanced than what they even announced right now and possibly probably will be even better than what we get with gp4 that is assuming gp4 doesn't roll out some more amazing advancements between now and then we'll just have to wait and see but that's my breakdown of the Gemini news from today that's kind of everything we know so far exciting to see this progress I'm excited to get my hands on Gemini Ultra for now I'm probably still going to mostly use Claude and gp4 but Bard is getting better and better and I think it's only a matter of time before Google really catches up with open Ai and Microsoft and what they're doing over there so exciting advancements can't wait to see more of it if you love nerding out about this kind of stuff check out future tools. I create all the coolest AI tools all the latest AI news and I even have a free newsletter where I will put all those cool tools and all the latest AI news directly in your inbox also this channel is extremely close to half a million subscribers I've almost hit 500,000 subscribers which is just blowing my mind right now thank you so much to everybody that subscribed I want to do a special little giveaway as soon as I hit 500,000 subscribers I plan to give five random subscribers to this channel AAR of meta AI Rayban smart glasses all you got to do is subscribe to this Channel and as soon as I pass 500,000 I am going to pick five random subscribers whether you're a new subscriber or an old subscriber you have a chance to win so if you haven't already click subscribe and if you like this video and you want to see more of these types of videos in your YouTube feed give this video a like and I'll make sure that more videos like this around AI news AI tutorials AI research and all of that cool stuff shows up in your YouTube feed once again I really really appreciate you and am just blown away at how fast all of this AI stuff is moving I love nerding out about it I'm glad I have people like you who love nerding out about it with me so thank you once again for tuning in I really appreciate you I'll see you in the next video bye-bye
Info
Channel: Matt Wolfe
Views: 343,125
Rating: undefined out of 5
Keywords: AI, Artificial Intelligence, FutureTools, Futurism, Machine Learning, Deep Learning, Future Tools, Matt Wolfe, AI News, AI Tools, google ai, deepmind, gemini, chatgpt alternative, deepmind ai, google deepmind, ai, artificial intelligence, chatgpt alternatives, google gemini, ai news, deepmind google, chatgpt, google bard, openai, open ai, google, chat gpt, ai tools, gpt 4, agi, chat gpt 4, generative ai, deep learning, bard, best ai tools
Id: lgBAS9CFYlE
Channel Id: undefined
Length: 24min 45sec (1485 seconds)
Published: Wed Dec 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.