[Gabriela Surita speaking]
We built Gemini from the ground up to be natively multimodal,
including something quite important for both
of us, programming code. Gemini is able to consistently understand, explain and generate code that is correct, and well-written in most
programming languages. That includes Python, Java, C++ and Go. It substantially improves coding abilities over previous PaLM 2 models. From a benchmark of around 200 programming functions in Python it consistently solves about 75% of them in the first try, versus around 45% on PaLM 2. If you allow Gemini to check
and repair its own answers, this number jumps to over 90%,
which is a huge step forward. It can help you create and
prototype new ideas in seconds. Let's give it a try. I really like trains, and if I wanted to create a
train-spotting location web app, I can simply ask and
get a working prototype in less than a minute.
[light clicking] While the code isn't perfect, it's really helpful to have a first draft. Gemini, on its own, has
the ability to transform software development as we understand it, but it can also be
deployed as a key component of more sophisticated systems. [Rémi Leblond speaking]
Gemini is great at coding, but we've been able to
take it even further, creating a specialized version that performs remarkably well
at competitive programming. Now, why do we care about
competitive programming? Well, it is one of the
ultimate litmus tests of algorithmic coding abilities. So, you have thousands
of talented programmers from all over the world that come together to compete and try to solve
incredibly complex problems that require not only coding,
but also math and reasoning. Two years ago, we presented AlphaCode, and it was the first AI system that could compete roughly at the level of the average human competitor. Today, I'm delighted to
introduce AlphaCode2, a new and enhanced system with massively improved
performance, powered by Gemini. When we evaluate AlphaCode2
on the same platform as the original AlphaCode, we solve almost twice as many problems. While AlphaCode broke through the top half of human competitors, on average, we estimate that
AlphaCode2 performs better than 85% percent of
competition participants. Let's have a look at our model in action on one of the hardest
problems that we faced, and I say "hard" because
in the original contest in which the problem appeared, less than 0.2% of participants
actually solved it. The problem is quite difficult. It's very abstract, so I can't
get into too many details, but the basic gist of
it is that we are tasked with computing aggregate statistics that account for what appears to be an impossibly large
amount of random arrays. The really cool thing is that to solve it, AlphaCode2 makes use
of dynamic programming. Dynamic programming is an
advanced algorithmic technique, which basically simplifies
a complicated problem by breaking it down into easier sub-problems again and again, and what's really impressive is that not only AlphaCode2 knows how to properly implement this strategy, but also when and where to use it. What the example shows us is
that competitive programming is not just about implementation, it's also about understanding, maths, computer science and indeed coding and that makes it an
extremely hard reasoning task. So, it's not very
surprising that up 'til now, generally available large language models have scored very poorly on this benchmark. These models are really, really good at following instructions, but AlphaCode needs to do more than that. It needs to show some
level of understanding, some level of reasoning,
designing of code solutions, before it can actually get
to the actual implementation to solve the problem, and it does all that on problems that it's never seen before. Another thing that is
great about AlphaCode is that it performs even better when it collaborates with human coders who can provide grounding. Basically, developers
can specify properties that the code samples have
to obey and when we do that, we see performance increase significantly. We think of this, this kind of interaction
between programmers and AIs, as the future of programming, where coders will not
just give instructions, but actually collaborate
with highly capable AI models that can reason about their problems, that can propose code designs and that can even help with
the actual implementation. AlphaCode2 was built for
competitive programming, but we're already working on bringing some of its unique capabilities right into the general Gemini models, as a first step towards making this new programming paradigm
available for everyone.