Google I/O 2024 Keynote: Sundar Pichai opening remarks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[VIDEO PLAYBACK] - Google's ambitions in artificial intelligence. - Google launches Gemini. - AI is rolling out to Workspace. - And it's completely changing the way we work. - You know, a lot has happened in a year. [DOG BARKING] There have been new beginnings. [MUSIC PLAYING] We found new ways to find new ideas and new solutions to age-old problems. - Sorry about your shirt. - We dreamt of things. Never too old for a treehouse. We trained for things-- - Let's go, go go. - --and learned about this thing. We found new paths, took the next step, and made the big leap. Cannonball! We filled days like they were weeks, and more happened in months than has happened in years. [CHICKENS CLUCKING] Free eggs. Things got bigger, like way bigger. And it wasn't all just for him, or for her. It was for everyone. [MUSIC PLAYING] [HORSE WHINNYING] And you know what? We're just getting started. [END PLAYBACK] [APPLAUSE] SUNDAR PICHAI: Hi, everyone. Good morning. [APPLAUSE] Welcome to Google I/O. It's great to have all of you with us. We have a few thousand developers with us here today at Shoreline. Millions more are joining virtually around the world. Thanks to everyone for being here. For those of you who haven't seen I/O before, it's basically Google's version of the Eras tour, but with fewer costume changes. [LAUGHTER] At Google, though, we are fully in our Gemini era. You will hear a lot about that today. Before we get into it, I want to reflect on this moment we are in. We have been investing in AI for more than a decade, and innovating at every layer of the stack-- research, product, infrastructure. We are going to talk about it all today. Still, we are in the very early days of the AI platform shift. We see so much opportunity ahead for creators, for developers, for startups-- for everyone. Helping to drive those opportunities is what our Gemini era is all about. So let's get started. A year ago on this stage, we first shared our plans for Gemini, a frontier model built to be natively multimodal from the very beginning that could reason across text, images, video, code, and more. It's a big step in turning any input into any output-- an I/O for a new generation. Since then, we introduced the first Gemini models, our most capable yet. They demonstrated state-of-the-art performance on every multimodal benchmark, and that was just the beginning. Two months later, we introduced Gemini 1.5 Pro, delivering a big breakthrough in long context. It can run one million tokens in production consistently, more than any other large-scale foundation model yet. We want everyone to benefit from what Gemini can do. So we have worked quickly to share these advances with all of you. Today, more than 1.5 million developers use Gemini models across our tools. You're using it to debug code, get new insights, and to build the next generation of AI applications. We have also been bringing Gemini's breakthrough capabilities across our products in powerful ways. We'll show examples today across Search, Photos, Workspace, Android, and more. Today, all of our two billion user products use Gemini. And we have introduced new experiences too, including on mobile, where people can interact with Gemini directly through the app, now available on Android and iOS, and through Gemini Advanced, which provides access to our most capable models. Over one million people have signed up to try it in just three months, and it continues to show strong momentum. One of the most exciting transformations with Gemini has been in Google Search. In the past year, we have answered billions of queries as part of our Search generative experience. People are using it to search in entirely new ways and asking new types of questions, longer and more complex queries, even searching with photos, and getting back the best the web has to offer. We have been testing this experience outside of Labs, and we are encouraged to see not only an increase in Search usage, but also an increase in user satisfaction. I'm excited to announce that we'll begin launching this fully revamped experience, AI Overviews, to everyone in the US this week, and we'll bring it to more countries soon. [APPLAUSE] There's so much innovation happening in Search. Thanks to Gemini, we can create much more powerful search experiences, including within our products. Let me show you an example in Google Photos. We launched Google Photos almost nine years ago. Since then, people have used it to organize their most important memories. Today, that amounts to more than six billion photos and videos uploaded every single day. And people love using Photos to search across their life. With Gemini, you're making that a whole lot easier. Say you're at a parking station ready to pay, but you can't recall your license plate number. Before, you could search Photos for keywords and then scroll through years worth of photos looking for the right one. Now you can simply ask Photos. It knows the cars that appear often. It triangulates which one is yours and just tells you the license plate number. [APPLAUSE] And Ask Photos can also help you search your memories in a deeper way. For example, you might be reminiscing about your daughter Lucia's early milestones. You can ask Photos, when did Lucia learn to swim? You can even follow up with something more complex-- show me how Lucia's swimming has progressed. Here, Gemini goes beyond a simple search, recognizing different contexts, from doing laps in the pool to snorkeling in the ocean, to the text and dates on her swimming certificates. And Photos packages it up all together in a summary. You can really take it all in and relive amazing memories all over again. We are rolling out Ask Photos this summer with more capabilities to come. [APPLAUSE] Unlocking knowledge across formats is why we built Gemini to be multimodal from the ground up. It's one model with all the modalities built in. So not only does it understand each type of input, it finds connections between them. Multimodality radically expands the questions we can ask and the answers we will get back. Long context takes this a step further, enabling us to bring in even more information-- hundreds of pages of text, hours of audio, a full hour of video, or entire code repos. Or if you want, roughly 96 Cheesecake Factory menus. [LAUGHTER] For that many menus, you need a one million token context window, now possible with Gemini 1.5. Pro. Developers have been using it in super interesting ways. Let's take a look. [VIDEO PLAYBACK] [MUSIC PLAYING] - I remember the announcement, the one million token context window-- and my first reaction was, there's no way they were able to achieve this. - I wanted to test its technical skills, so I uploaded a line chart. It was temperatures between Tokyo and Berlin and how they vary across the 12 months of the year. - So I got in there, and I threw in the Python library that I was really struggling with, and I just asked it a simple question. And it nailed it. It could find specific references to comments in the code and specific requests that people had made and other issues that people had had, but then suggest a fix for it that related to what I was working on. - I immediately tried to crash it, so I took four or five research papers I had on my desktop. And it's a mind blowing experience when you add so much text and then you see the amount of tokens you add-- It's not even at half the capacity. - It felt a little bit like Christmas, because you saw things peppered up to the top of your feed about like, oh, wow, I built this thing, or oh, it's doing this and I would have never expected. - Can I shoot a video of my possessions and turn that into a searchable database? So I ran to my bookshelf, and I shot a video just panning my camera along the bookshelf, and I fed the video into the model. It gave me the titles and authors of the books, even though the authors weren't visible on those book spines. And on the bookshelf, there was a squirrel nutcracker sat in front of the book, truncating the title. You could just see the word "site-see," and it still guessed the correct book. The range of things you can do with that is almost unlimited. - And so at that point, for me, it was just like a click, like this is it. - I thought I had a superpower in my hands. - It was poetry. It was beautiful. I was so happy. It just-- this is going to be amazing. This is going to help people. - This is where the future of language models are going-- personalized to you not because you trained it to be personal to you, but personal to you because you can give it such a vast understanding of who you are. [END PLAYBACK] [APPLAUSE] SUNDAR PICHAI: We have been rolling out Gemini 1.5 Pro with long context in preview over the last few months. We made a series of quality improvements across translation, coding, and reasoning. You will see these updates reflected in the model starting today. I'm excited to announce that we are bringing this improved version of Gemini 1.5 Pro to all developers globally. [APPLAUSE] In addition, today Gemini 1.5 Pro with one million contexts is now directly available for consumers in Gemini Advanced and can be used across 35 languages. One million tokens is opening up entirely new possibilities. It's exciting, but I think we can push ourselves even further. So today, we are expanding the context window to two million tokens. [APPLAUSE] We are making it available for developers in private preview. It's amazing to look back and see just how much progress we have made in a few months. This represents the next step on our journey towards the ultimate goal of infinite context. So far, we have talked about two technical advances, multimodality and long context. Each is powerful on its own, but together, they unlock deeper capabilities and more intelligence. Let's see how this comes to life with Google Workspace. People are always searching their emails in Gmail. We are working to make it much more powerful with Gemini. Let's look at how. As a parent, you want to know everything that's going on with your child's school-- OK, maybe not everything. But you want to stay informed. Gemini can help you keep up. Now we can ask Gemini to summarize all recent emails from the school. In the background, it's identifying relevant emails, even analyzing attachments like PDFs, and you get a summary of the key points and action items-- so helpful. Maybe you were traveling this week, and you couldn't make the PTA meeting. The recording of the meeting is an hour long. If it's from Google Meet, you can ask Gemini to give you the highlights. [APPLAUSE] There's a parents group looking for volunteers. You're free that day. Of course, Gemini can draft a reply. There are countless other examples of how this can make life easier. Gemini 1.5 Pro is available today in Workspace Labs, and Apurna will share more later on. [APPLAUSE] We just looked at an example with text outputs. But with a multimodal model, we can do so much more. To show you an early demo of an audio output in NotebookLM, here's Josh. [APPLAUSE] JOSH WOODWARD: Hey, everyone. Last year at I/O, we introduced NotebookLM, a research and writing tool grounded in the information you give it. Since then, we've seen a lot of momentum, with students and teachers using it. And today, Gemini 1.5 Pro is coming to NotebookLM, and it's great. Let me show you. So here we are in NotebookLM. You can load it up with all the materials here on the left. In this notebook. I've been using it with my younger son, and I've added some of his science worksheets, a few slide decks from his teacher, and even an open source textbook full of charts and diagrams. With 1.5 Pro, it instantly creates this notebook guide with a helpful summary and can generate a study guide, an FAQ, or even quizzes. But for my son Jimmy, he really learns best when he can listen to something. So we've prototyped a new feature with Gemini, and it's called Audio Overviews. NotebookLM is going to take all the materials on the left as input and output them into a lively science discussion personalized for him. Let's take a listen. [VIDEO PLAYBACK] - So let's dive into physics. What's on deck for today? - Well, we're starting with the basics, force and motion. And that, of course, means we have to talk about Sir Isaac Newton and his three laws of motion. - Ah, yes, the foundation for understanding how objects move and interact. [END PLAYBACK] JOSH WOODWARD: Ah, yes. This is where multimodal really shines. Now, it generated this audio discussion based on that text material. And what's amazing is that my son and I can join into the conversation and steer it whichever direction we want when I tap Join. DIGITAL VOICE: Hold on. We have a question. What's up, Josh? JOSH WOODWARD: Yeah. Can you give my son Jimmy a basketball example? DIGITAL VOICE: Hey, Jimmy! That's a fantastic idea. Basketball is actually a great way to visualize force and motion. Let's break it down. DIGITAL VOICE: OK, so first, imagine a basketball just sitting there on the court. It's not moving, right? That's because all the forces acting on it are balanced. The downward pull of gravity-- JOSH WOODWARD: Pretty cool, right? [APPLAUSE] I got to say, the first time my son heard this, you should have seen how big his eyes got, because he was gripped. They were talking to him. He was learning science through the example of basketball, his favorite sport. Now, what's interesting is, under the hood, you saw that Gemini had used some of the concepts of gravity, Sir Isaac Newton, but nothing in there was about basketball. It connected the dots and created that age-appropriate example for him. And this is what's becoming possible with the power of Gemini. You can give it lots of information in any format, and it can be transformed in a way that's personalized and interactive for you. Back to you, Sundar. [APPLAUSE] SUNDAR PICHAI: Thanks, Josh. The demo shows the real opportunity with multimodality. Soon you'll be able to mix and match inputs and outputs. This is what we mean when we say it's an I/O for a new generation. And I can see you all out there thinking about the possibilities. But what if it could go even further? That's one of the opportunities we see with AI agents. Let me take a step back and explain what I mean by that. I think about them as intelligent systems that show reasoning, planning, and memory; are able to think multiple steps ahead; work across software and systems, all to get something done on your behalf, and most importantly, under your supervision. We are still in the early days, and you will see glimpses of our approach throughout the day. But let me show you the kinds of use cases we are working hard to solve. Let's start with shopping. It's pretty fun to shop for shoes, and a lot less fun to return them when they don't fit. Imagine if Gemini could do all the steps for you-- searching your inbox for the receipt, locating the order number from your email, filling out a return form, and even scheduling a pickup. That's much easier, right? [APPLAUSE] Let's take another example that's a bit more complex. Say you just moved to Chicago. You can imagine Gemini and Chrome working together to help you do a number of things to get ready-- organizing, reasoning, synthesizing on your behalf. For example, you will want to explore the city and find services nearby, from dry cleaners to dog walkers. You'll have to update your new address across dozens of websites. Gemini can work across these tasks and will prompt you for more information when needed so you're always in control. That part is really important as we prototype these experiences. We are thinking hard about how to do it in a way that's private, secure, and works for everyone. These are simple use cases, but they give you a good sense of the types of problems we want to solve by building intelligent systems that think ahead, reason, and plan, all on your behalf. The power of Gemini with multimodality, long contexts, and agents brings us closer to our ultimate goal-- making AI helpful for everyone. We see this as how we will make the most progress against our mission-- organizing the world's information across every input, making it accessible via any output, and combining the world's information with the information in your world in a way that's truly useful for you. To fully realize the benefits of AI, we'll continue to break new ground. Google DeepMind is hard at work. To share more, please welcome for the first time on the I/O stage Sir Demis.
Info
Channel: Google
Views: 12,705
Rating: undefined out of 5
Keywords: AI, Tech, AI News, Artificial Intelligence, Machine Learning, AI Tools, Prompt Engineering, Technology, Generative AI, Deep Learning, AI Explained, Google, Google I/O, 2024, google accessibility features, google product development, google updates, google product updates, gemini ai, gemini, google gemini, gemini google, google ai
Id: uFroTufv6es
Channel Id: undefined
Length: 20min 57sec (1257 seconds)
Published: Thu May 16 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.