[VIDEO PLAYBACK] - Google's ambitions in
artificial intelligence. - Google launches Gemini. - AI is rolling
out to Workspace. - And it's completely
changing the way we work. - You know, a lot has
happened in a year. [DOG BARKING] There have been new beginnings. [MUSIC PLAYING] We found new ways to find
new ideas and new solutions to age-old problems. - Sorry about your shirt. - We dreamt of things. Never too old for a treehouse. We trained for things-- - Let's go, go go. - --and learned
about this thing. We found new paths,
took the next step, and made the big leap. Cannonball! We filled days like
they were weeks, and more happened in months
than has happened in years. [CHICKENS CLUCKING] Free eggs. Things got bigger,
like way bigger. And it wasn't all just
for him, or for her. It was for everyone. [MUSIC PLAYING] [HORSE WHINNYING] And you know what? We're just getting started. [END PLAYBACK] [APPLAUSE] SUNDAR PICHAI: Hi, everyone. Good morning. [APPLAUSE] Welcome to Google I/O. It's
great to have all of you with us. We have a few thousand
developers with us here today at Shoreline. Millions more are joining
virtually around the world. Thanks to everyone
for being here. For those of you who
haven't seen I/O before, it's basically Google's
version of the Eras tour, but with fewer costume changes. [LAUGHTER] At Google, though, we are
fully in our Gemini era. You will hear a lot
about that today. Before we get into
it, I want to reflect on this moment we are in. We have been investing in
AI for more than a decade, and innovating at every
layer of the stack-- research, product,
infrastructure. We are going to talk
about it all today. Still, we are in the very early
days of the AI platform shift. We see so much opportunity ahead
for creators, for developers, for startups-- for everyone. Helping to drive
those opportunities is what our Gemini
era is all about. So let's get started. A year ago on this stage,
we first shared our plans for Gemini, a
frontier model built to be natively multimodal
from the very beginning that could reason across text,
images, video, code, and more. It's a big step in turning
any input into any output-- an I/O for a new generation. Since then, we introduced
the first Gemini models, our most capable yet. They demonstrated
state-of-the-art performance on every multimodal benchmark,
and that was just the beginning. Two months later, we
introduced Gemini 1.5 Pro, delivering a big
breakthrough in long context. It can run one million
tokens in production consistently, more than any
other large-scale foundation model yet. We want everyone to benefit
from what Gemini can do. So we have worked quickly
to share these advances with all of you. Today, more than 1.5 million
developers use Gemini models across our tools. You're using it to debug
code, get new insights, and to build the next
generation of AI applications. We have also been bringing
Gemini's breakthrough capabilities across our
products in powerful ways. We'll show examples today across
Search, Photos, Workspace, Android, and more. Today, all of our two billion
user products use Gemini. And we have introduced
new experiences too, including on mobile, where
people can interact with Gemini directly through the app, now
available on Android and iOS, and through Gemini Advanced,
which provides access to our most capable models. Over one million
people have signed up to try it in just three
months, and it continues to show strong momentum. One of the most exciting
transformations with Gemini has been in Google Search. In the past year, we have
answered billions of queries as part of our Search
generative experience. People are using it to
search in entirely new ways and asking new
types of questions, longer and more complex queries,
even searching with photos, and getting back the best
the web has to offer. We have been testing this
experience outside of Labs, and we are encouraged
to see not only an increase in Search usage,
but also an increase in user satisfaction. I'm excited to announce that
we'll begin launching this fully revamped experience, AI
Overviews, to everyone in the US this week, and we'll bring
it to more countries soon. [APPLAUSE] There's so much innovation
happening in Search. Thanks to Gemini, we can create
much more powerful search experiences, including
within our products. Let me show you an
example in Google Photos. We launched Google Photos
almost nine years ago. Since then, people have
used it to organize their most important memories. Today, that amounts to more than
six billion photos and videos uploaded every single day. And people love using Photos
to search across their life. With Gemini, you're making
that a whole lot easier. Say you're at a parking
station ready to pay, but you can't recall your
license plate number. Before, you could search
Photos for keywords and then scroll through
years worth of photos looking for the right one. Now you can simply ask Photos. It knows the cars
that appear often. It triangulates which one
is yours and just tells you the license plate number. [APPLAUSE] And Ask Photos can also help
you search your memories in a deeper way. For example, you might be
reminiscing about your daughter Lucia's early milestones. You can ask Photos, when
did Lucia learn to swim? You can even follow up with
something more complex-- show me how Lucia's
swimming has progressed. Here, Gemini goes
beyond a simple search, recognizing different
contexts, from doing laps in the pool to
snorkeling in the ocean, to the text and dates on
her swimming certificates. And Photos packages it up
all together in a summary. You can really take it all in
and relive amazing memories all over again. We are rolling out
Ask Photos this summer with more capabilities to come. [APPLAUSE] Unlocking knowledge
across formats is why we built Gemini to be
multimodal from the ground up. It's one model with all
the modalities built in. So not only does it
understand each type of input, it finds connections
between them. Multimodality radically expands
the questions we can ask and the answers
we will get back. Long context takes
this a step further, enabling us to bring in
even more information-- hundreds of pages of text, hours
of audio, a full hour of video, or entire code repos. Or if you want, roughly 96
Cheesecake Factory menus. [LAUGHTER] For that many menus,
you need a one million token context window, now
possible with Gemini 1.5. Pro. Developers have been using
it in super interesting ways. Let's take a look. [VIDEO PLAYBACK] [MUSIC PLAYING] - I remember the announcement,
the one million token context window-- and my
first reaction was, there's no way they were
able to achieve this. - I wanted to test
its technical skills, so I uploaded a line chart. It was temperatures
between Tokyo and Berlin and how they vary across
the 12 months of the year. - So I got in there, and I
threw in the Python library that I was really
struggling with, and I just asked it
a simple question. And it nailed it. It could find
specific references to comments in the code
and specific requests that people had made and other
issues that people had had, but then suggest a fix
for it that related to what I was working on. - I immediately
tried to crash it, so I took four or five research
papers I had on my desktop. And it's a mind
blowing experience when you add so much
text and then you see the amount of tokens you add-- It's not even at
half the capacity. - It felt a little
bit like Christmas, because you saw
things peppered up to the top of your feed
about like, oh, wow, I built this thing,
or oh, it's doing this and I would have never expected. - Can I shoot a video
of my possessions and turn that into a
searchable database? So I ran to my bookshelf,
and I shot a video just panning my camera
along the bookshelf, and I fed the video
into the model. It gave me the titles
and authors of the books, even though the authors weren't
visible on those book spines. And on the bookshelf, there
was a squirrel nutcracker sat in front of the book,
truncating the title. You could just see
the word "site-see," and it still guessed
the correct book. The range of things you can do
with that is almost unlimited. - And so at that point, for
me, it was just like a click, like this is it. - I thought I had a
superpower in my hands. - It was poetry. It was beautiful. I was so happy. It just-- this is
going to be amazing. This is going to help people. - This is where the future of
language models are going-- personalized to
you not because you trained it to be personal
to you, but personal to you because you can give it such
a vast understanding of who you are. [END PLAYBACK] [APPLAUSE] SUNDAR PICHAI: We have been
rolling out Gemini 1.5 Pro with long context in preview
over the last few months. We made a series of
quality improvements across translation,
coding, and reasoning. You will see these
updates reflected in the model starting today. I'm excited to announce that
we are bringing this improved version of Gemini 1.5 Pro
to all developers globally. [APPLAUSE] In addition, today Gemini 1.5
Pro with one million contexts is now directly available for
consumers in Gemini Advanced and can be used
across 35 languages. One million tokens is opening
up entirely new possibilities. It's exciting, but I think we
can push ourselves even further. So today, we are expanding
the context window to two million tokens. [APPLAUSE] We are making it available for
developers in private preview. It's amazing to look back and
see just how much progress we have made in a few months. This represents the
next step on our journey towards the ultimate
goal of infinite context. So far, we have talked about
two technical advances, multimodality and long context. Each is powerful on its
own, but together, they unlock deeper capabilities
and more intelligence. Let's see how this comes to
life with Google Workspace. People are always searching
their emails in Gmail. We are working to make it much
more powerful with Gemini. Let's look at how. As a parent, you want to know
everything that's going on with your child's school-- OK, maybe not everything. But you want to stay informed. Gemini can help you keep up. Now we can ask Gemini to
summarize all recent emails from the school. In the background, it's
identifying relevant emails, even analyzing
attachments like PDFs, and you get a summary of the
key points and action items-- so helpful. Maybe you were
traveling this week, and you couldn't
make the PTA meeting. The recording of the
meeting is an hour long. If it's from Google
Meet, you can ask Gemini to give you the highlights. [APPLAUSE] There's a parents group
looking for volunteers. You're free that day. Of course, Gemini
can draft a reply. There are countless
other examples of how this can
make life easier. Gemini 1.5 Pro is available
today in Workspace Labs, and Apurna will
share more later on. [APPLAUSE] We just looked at an
example with text outputs. But with a multimodal model,
we can do so much more. To show you an early demo of
an audio output in NotebookLM, here's Josh. [APPLAUSE] JOSH WOODWARD: Hey, everyone. Last year at I/O, we
introduced NotebookLM, a research and
writing tool grounded in the information you give it. Since then, we've seen
a lot of momentum, with students and
teachers using it. And today, Gemini 1.5 Pro
is coming to NotebookLM, and it's great. Let me show you. So here we are in NotebookLM. You can load it up with all
the materials here on the left. In this notebook. I've been using it
with my younger son, and I've added some of
his science worksheets, a few slide decks
from his teacher, and even an open source textbook
full of charts and diagrams. With 1.5 Pro, it instantly
creates this notebook guide with a helpful summary and can
generate a study guide, an FAQ, or even quizzes. But for my son Jimmy, he
really learns best when he can listen to something. So we've prototyped a
new feature with Gemini, and it's called Audio Overviews. NotebookLM is going to
take all the materials on the left as input
and output them into a lively science
discussion personalized for him. Let's take a listen. [VIDEO PLAYBACK] - So let's dive into physics. What's on deck for today? - Well, we're starting with
the basics, force and motion. And that, of course,
means we have to talk about Sir Isaac Newton
and his three laws of motion. - Ah, yes, the foundation
for understanding how objects move and interact. [END PLAYBACK] JOSH WOODWARD: Ah, yes. This is where multimodal
really shines. Now, it generated
this audio discussion based on that text material. And what's amazing
is that my son and I can join into the conversation
and steer it whichever direction we want when I tap Join. DIGITAL VOICE: Hold on. We have a question. What's up, Josh? JOSH WOODWARD: Yeah. Can you give my son Jimmy
a basketball example? DIGITAL VOICE: Hey, Jimmy! That's a fantastic idea. Basketball is
actually a great way to visualize force and motion. Let's break it down. DIGITAL VOICE: OK, so first,
imagine a basketball just sitting there on the court. It's not moving, right? That's because all the forces
acting on it are balanced. The downward pull of gravity-- JOSH WOODWARD:
Pretty cool, right? [APPLAUSE] I got to say, the
first time my son heard this, you should have
seen how big his eyes got, because he was gripped. They were talking to him. He was learning science through
the example of basketball, his favorite sport. Now, what's interesting
is, under the hood, you saw that Gemini had used
some of the concepts of gravity, Sir Isaac Newton, but nothing
in there was about basketball. It connected the
dots and created that age-appropriate
example for him. And this is what's becoming
possible with the power of Gemini. You can give it lots of
information in any format, and it can be transformed
in a way that's personalized and
interactive for you. Back to you, Sundar. [APPLAUSE] SUNDAR PICHAI: Thanks, Josh. The demo shows the real
opportunity with multimodality. Soon you'll be able to mix
and match inputs and outputs. This is what we mean
when we say it's an I/O for a new generation. And I can see you all
out there thinking about the possibilities. But what if it could
go even further? That's one of the opportunities
we see with AI agents. Let me take a step back and
explain what I mean by that. I think about them as
intelligent systems that show reasoning,
planning, and memory; are able to think
multiple steps ahead; work across software
and systems, all to get something
done on your behalf, and most importantly,
under your supervision. We are still in the
early days, and you will see glimpses of our
approach throughout the day. But let me show you
the kinds of use cases we are working hard to solve. Let's start with shopping. It's pretty fun to shop for
shoes, and a lot less fun to return them when
they don't fit. Imagine if Gemini could
do all the steps for you-- searching your inbox
for the receipt, locating the order
number from your email, filling out a return form,
and even scheduling a pickup. That's much easier, right? [APPLAUSE] Let's take another example
that's a bit more complex. Say you just moved to Chicago. You can imagine Gemini and
Chrome working together to help you do a number
of things to get ready-- organizing, reasoning,
synthesizing on your behalf. For example, you will
want to explore the city and find services nearby, from
dry cleaners to dog walkers. You'll have to update
your new address across dozens of websites. Gemini can work
across these tasks and will prompt you
for more information when needed so you're
always in control. That part is really important as
we prototype these experiences. We are thinking hard
about how to do it in a way that's private,
secure, and works for everyone. These are simple use
cases, but they give you a good sense of the
types of problems we want to solve by building
intelligent systems that think ahead, reason, and
plan, all on your behalf. The power of Gemini with
multimodality, long contexts, and agents brings us closer
to our ultimate goal-- making AI helpful for everyone. We see this as how
we will make the most progress against our mission-- organizing the world's
information across every input, making it accessible
via any output, and combining the
world's information with the information
in your world in a way that's
truly useful for you. To fully realize
the benefits of AI, we'll continue to
break new ground. Google DeepMind is hard at work. To share more, please
welcome for the first time on the I/O stage Sir Demis.