Google Gemini 1.5 - Google Shocked Everyone with 1,000,000 Token Context Window

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Google just introduced a brand new AI model it's called Gemini 1.5 and it's a massive leap over the previous model Gemini 1.0 remember Google B last week it actually got changed to Gemini that was a branding thing but the model that drove barred and Gemini was called Gemini 1.0 Pro or Gemini Pro 1.0 this model 1.5 has three major improvements over the previous I'll break it down here in this video and there's still this Gemini Advanced model that is still powered by Gemini Ultra so this one is not getting changed but I believe Gemini 1.5 is getting rolled out to this version of Gemini unless they introduced some kind of middle tier here we're not sure yet they haven't talked about pricing a 1.5 if it's going to be free right now it's going to be rolling out in a free preview so I'll show you the biggest update here so Gemini 1.0 had a context window of 32,000 tokens okay that was pretty pretty typical of large language models gp4 Turbo released one that was 128k much bigger Claude was the leader with 200k and then Gemini 1.5 decided to go all the way to a million and they're even testing a 10 million token context window okay so what is a million context window actually it's basically 700,000 word prompt so you could give it entire books entire trilogies it also could do 1 hour of video that would be close to 1 million tokens 11 hours of audio you could give it a codebase up to or over 30,000 line of code currently in the same context window I couldn't even really get to a th000 line of code without something going wrong so this could be really incredible and again they're testing out something with a 10 million token now as far as when it's going to be available and then I'll show you some of the actual use cases for it that they demoed here incredible but the 1 million token is right now in private preview and it's really in the API version so it's really made for developers the 1.5 Pro though is going to get the standard 128,000 token context window which again is a significant bump from the 32,000 that's available inside of Gemini 1.0 so it's going to give us basically what gp4 turbo has again that's a pretty large context window but once we get to a million token context window I mean 700,000 words I've never really had anything that was that big I had to like create entire books and they still were only like 120,000 words so that's going to just open up a whole world of possibilities for different AI foundational models and I'm assuming Claude Chad PD those those guys are not going to just stand behind Google and let them get to a million without them actually trying to increase their context Windows too now let me show you this quick video that Google actually posted posted here this will show you Gemini 1.5 is real power especially when it comes to context window and basically Gemini 1.5 is beating Gemini 1.0 in 87% of the times based on some of the studies that I seen here so it's a massive impr Improvement so complex reasoning is at a whole different level and they basically said it's comparable to ultra 1.0 which I met a previous video about and I'm doing a deep dive video that we're posting very soon but Ultra 1.0 is the best version they have but they're saying Gemini 1.5 is going to be comparable to that and this video that I'm about to play basically took a 42 Page transcript of the Apollo 11 transcript and he used it as basically the way they prompted Gemini 1.5 Pro this is a demo of long context understanding an experimental feature in our newest model Gemini 1.5 Pro we'll walk through a screen recording of example prompts using a 402 page PDF of the Apollo 11 transcript which comes out to almost 330,000 tokens we started by uploading the Apollo PDF into Google AI studio and asked find three comedic moments list quotes from this transcript and Emoji this screen capture is sped up this timer shows exactly how long it took to process each prompt and keep in mind that processing times will vary the Auto responded with three quotes like this one from Michael Collins I'll bet you a cup of coffee on it if we go back to the transcript we can see the model found this exact quote and extracted the comedic moment accurately then we tested a multimodal prompt we gave it this drawing of a scene we were thinking of and asked what moment is this the model correctly identified it as Neil's first steps on the moon notice how we didn't explain what was happening in the drawing simple drawings like this are a good way to test if the model can find something based on just a few abstract details and for the last prompt we ask the model to cite the time code of this moment in the transcript like all generative models responses like this won't always be perfect they can sometimes be a digit or two off but let's look at the model's response here and when we find this moment in the transcript we can see that this time code is correct these are just a few examples of what's possible with a context window of up to 1 million multimodal tokens in Gemini 1.5 Pro now remember remember it's not just about text with Google so Google with Gemini with ultra they're really trying to take things to the next level in multimodality GPT 4 is still the leader Best Vision in class right now but it looks like this version a Gemini is really going to give GPT 4 a run for its money so this version actually used a 44 minute silent film as a prompt so the multimodality is it could basically be fed code audio video images text and it's going to be able to use that as the prompt so let's take a look at this and again these previews that I'm showing you here they're really not yet available to the public outside of the preview window for developers but just imagine where it's going to go as soon as it becomes accessible obviously we'll cover that in Deep dive tutorials this is a demo of long context understanding an experimental feature in our newest model Gemini 1.5 Pro we'll walk through a screen recording of example prompts using a 44-minute buster Katon film which comes out to over 600,000 tokens in Google AI Studio we uploaded the video and asked find the moment when a piece of paper is removed from the person's pocket and tell me some key information on it with the time code this screen capture is sped up and this timer shows exactly how long it took to process each prompt and keep in mind that processing times will vary the model gave us this response explaining that the piece of paper is a pawn ticket from Goldman and Company Pawn Brokers with the date and cost and it gave us this time code 1201 when we pulled up that time code we found it was correct the model had found the exact moment the piece of paper is removed from the person's pocket and it extracted text accurately next we gave it this drawing of a scene we were thinking of and asked what is the time code when this happens this is an example of a multimodal prompt where we combine text and image in our input the model returned this time code 1534 we pulled that up and found that it was the correct scene like all generative models responses vary and won't always be perfect but notice how we didn't have to explain what was happening in the drawing simple drawings like this are a good way to test if the model can find something based on just a few abstract details like it did here here these are just a couple examples of what's possible with a context window of up to 1 million multimodal tokens in Gemini 1.5 Pro and I'll put a link to this page so you can see the other videos but it also has different problem solving skills that it didn't have previously especially if you use this for coding this is worth a watch over here now I tested this out with my version of Gemini Google if you ask Gemini a question what model is using it almost never answers so I can't really figure out if I have Gemini 1.5 but based on the context window I don't have it I tried to do something that was higher than the 32k token context window to see if I have the 128k I don't have it yet so this is an announcement it's not really a release so we'll see as soon as they release it and I have it I'll obviously do a comparison video and I also made a comprehensive detail dive comparing gp4 with ultra 1.0 so that video is going to get released so if you're watching this later I'll post it here but it's going to be released later this week it's the most comprehensive test I've ever done I spent almost a week testing 10 different categories of prompting so that's going to be released very soon thanks for watching this one I'll see you next time
Info
Channel: Skill Leap AI
Views: 17,254
Rating: undefined out of 5
Keywords: howfinity, skill leap ai, ai, Google Gemini 1.5, Google Gemini, Google ultra, chatGPT, GPT4, large language model, foundational AI
Id: PQc3lkVj99I
Channel Id: undefined
Length: 9min 11sec (551 seconds)
Published: Thu Feb 15 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.