[MUSIC PLAYING] JOSH GORDON: All right. [CHEERS AND APPLAUSE] Hi, everyone. Thank you so much for coming. So my name is Josh Gordon. I work on the Gemini API,
and I have an amazing team that builds a lot of the
examples and the SDKs that you'll see today. JOANA CARRASQUEIRA:
Hi, everyone, and I'm Joana Carrasqueira, a Senior
Manager in Developer Relations for AI here at Google. And I'm really excited
to be with you all today. LAURENCE MORONEY: Woo, Hi,
everybody, and I'm Laurence. I just do a whole bunch of AI
stuff here at Google, so thank you all for being here today. And we hope we have a
great session for you. JOSH GORDON: So I
think we might end-- LAURENCE MORONEY: Maybe having-- JOSH GORDON: In addition
to the three of us, we have a special guest. JOANA CARRASQUEIRA: Oh, do
we have a special guest? JOSH GORDON: Yes, we
have a special guest. LAURENCE MORONEY: Apparently. JOANA CARRASQUEIRA: Oh,
that's very interesting. I thought that we have a
very special audience today. [CHEERS AND APPLAUSE] LAURENCE MORONEY:
Hey, Sundar, welcome. SUNDAR PICHAI: How are you? JOSH GORDON: How are you. JOANA CARRASQUEIRA:
Welcome, Sundar. JOSH GORDON: Thank you
so much for joining, hi. SUNDAR PICHAI: All right. JOANA CARRASQUEIRA:
Thank you for joining us. SUNDAR PICHAI: I get
to crash the party. JOSH GORDON: Yeah. SUNDAR PICHAI: You
guys had fun today? AUDIENCE: Yeah. SUNDAR PICHAI: It's an
exciting time in the field. Yeah, there's a
lot going on, yeah. JOSH GORDON: Thank you
so much for joining. SUNDAR PICHAI: Yeah, well. JOSH GORDON: So to kick things
off, I have a question for you. SUNDAR PICHAI: All right. JOSH GORDON: So AI has
changed a lot recently, and it's becoming so
much more accessible. So what kinds of cool
things can developers do today with Gemini that
a few years ago, would have taken an engineering
team or a research team or tons of work? SUNDAR PICHAI: I think you
all are doing it already. I've seen crazy examples online. It always blows me when people
think about stuff before we do, but I think it's going to take
us a while for us to internalize what multi-modality means. The fact that
anything in any input can come out as any
output, and you can mix and match and do things. I think it's a
powerful new thing. We just all of us are putting
the plumbing in there, but I think you get to use it
and think through in a big way. Obviously, long
context, something as we make it more
and more easy to use, we introduce the
caching API coming soon. As we bring the
latency and cost down, I think that's another dimension
on which it's pretty powerful. And finally, as things
become more agentic, I think, it's an extraordinary
opportunity to push the boundaries on. But I'm always blown
away by what people do, rather than me telling
them what to do. LAURENCE MORONEY: So we have a
lot of developers in the room. I think. [CHEERS] So we got a few. So speaking of
developers, Sundar, how do you see
the developer role changing in this new AI world? SUNDAR PICHAI: Well,
the pace is pretty fast. But you're also getting
new tools to go with it. So, embracing AI in your
workflows more natively is going to be
important, I think. But also, I would say, you
have to challenge the existing assumptions across
everything you do. When I spoke about
multimodality, that's what I meant. You're just really
internalizing it. I think it takes
time to internalize that you can actually
go from any input to any output in a deep way. So I would just
say, internalizing what it means to be AI native. I think it's going
to take some time. When mobile came, we all
went through the same thing. Most of us did what
was in the web, and we kind of shoved
it into mobile. And then people started
doing really mobile native applications. I think we are in the
same phase with AI. We're kind of adding
some AI capabilities to existing applications. But really, stepping
back and rethinking it from the ground up. I think that's the most
important thing to do. JOANA CARRASQUEIRA:
That's really good. And I would like to follow up on
something that you've just said. It feels like the pace of
change is really fast in AI. So if we take a moment to
pause, what opportunities are you excited about
in the longer term? SUNDAR PICHAI: We are obviously
talking about the technology horizontally. But I think people
taking the chance to take verticals by vertical
and creating applications within it. You see examples of it,
like in health care. What we are doing with AlphaFold
And as we introduce Med Gemini models, what's possible. You see it with
learning, with Learn LM. But the chance to
take each vertical and go deep and solve
problems using AI, I think, that's the opportunity ahead. Kind of horizontal
applications are harder. But I think, the chance to do it
on a vertical-by-vertical basis and go deep and solve
a problem, I think, there's a lot of problems to
be solved and a lot of value to be created. JOANA CARRASQUEIRA:
Thank you so much. LAURENCE MORONEY: Thank
you, and before Sundar goes, do you mind if we take
a quick selfie with you and several hundred
of our close friends. SUNDAR PICHAI: All
right, awesome. [CHEERS] LAURENCE MORONEY:
Or it's an ussy. JOSH GORDON: Yeah. LAURENCE MORONEY: Josh,
we can't see you, buddy. OK, one, two, three. All right, thank you. JOSH GORDON: Thank you so much. JOANA CARRASQUEIRA:
Thank you, Sundar. LAURENCE MORONEY: Thanks a lot. JOANA CARRASQUEIRA:
Thank you, everybody. Thank you, Sundar. LAURENCE MORONEY: We also
have a little gift for you. This is the Indian copy
of my book, Josh, and-- JOANA CARRASQUEIRA: It
really is an exciting time to be a developer. And that's why we call today. SUNDAR PICHAI: Take care. JOSH GORDON: Thank you so much. JOANA CARRASQUEIRA:
Thank you, Sundar. LAURENCE MORONEY: Thanks. [APPLAUSE] JOSH GORDON: OK, awesome. So thanks again for coming. So to kick it off, I will speak
about the Gemini API and Google AI Studio, and then Joana will
speak about AI frameworks, and Lawrence will speak
about Google AI Edge. And so the Gemini API
and Google AI Studio. And I have a lot of cool
examples to show you. I'm going to move
a little bit quick. You've heard a lot about
Gemini 1.5 Pro today. The thing that I would like
to say, to kick this off, is I've been working in AI
now for the last 20 years. And at no point in
my life did I expect to see a model as
cool as Gemini. There's two things
that make it special. One is it's
multimodal, which means it can work with images, text,
code, and video, and audio right off the bat. The other thing is
the long context. So in a single prompt, you
can include one hour of video, 9.5 hours of audio, about 1,000
pages of text or about 3600 images. And this means that you can
reason across a huge amount of data in a single prompt. And it's just absolutely
extraordinary. So you've heard a
lot about this model, but what I want to show you, is
how easy it is to get started with just a few lines of code. And I have a couple quick
examples to show you, most of which we just pushed to
GitHub this morning, so you can try it out right off the bat. First, I'll show
you a quick example with videos, one with PDFs
demonstrating the long context, and then one with audio. Well, we'll get
there in one second. And then I have one
more example using code. This doesn't use
the long context, but it's super cool
just for developers, and I think you'll have
a lot of fun with it. So this is a clip from a
30-minute video of the American Museum of Natural History. And what it is, is it's someone
walking around with a camcorder. And they go through about
a third of the museum. And they have these
beautiful videos of all these great exhibits that
have tons and tons of detail. And I just want to show you,
I'm not showing this live, but you can do something
extremely similar on GitHub with code that
I'll show you in a sec. And so now that we have
this video, how can we work with Gemini 1.5 Pro? And we have two tools
that you can use. The first is our user interface. This is Google AI Studio. And it's a place where you
can prototype your prompts. And it's much more
than a playground. You can also tune models
and do cool stuff like that. But basically, in Google AI
Studio, just in your browser, you can insert this video. And what's cool, this
is a 30-minute video. I mentioned the context
length is a million tokens. This takes about
half the context. So we can do about an hour. And with the models that
you heard about today, you can go up to
two million tokens when you're off the waitlist. So something simple You? can
do, just to start, is you can do something like
video summarization. And this is a really
powerful and important thing, but it's not the cool thing
that I want to show you. So you can summarize the
video, and Gemini will give you a quick sentence
or sometimes it'll give you a paragraph talking
about the different exhibits that it saw. You can do much more
powerful things too. And so this is a prompt where
you upload a map of the museum. And so now you're working
with a video and a map. And you can ask Gemini,
name something on the map that we didn't see on the tour. And like all large
language models, this won't always
work perfectly, but it works well
a lot of the time, and this is just such
a cool capability. So it can reason
across the map and say that we didn't see the Rose
Center for Earth and Space. Other things you
can do, and this is starting to work pretty
well, is you can actually upload a drawing like this. And so this is a drawing a
friend of mine made for a geode. And we can upload the geode
into Google AI Studio, and say, where did we see
something on the tour that looks like this? And Gemini says at
29:40 in the video. And if you flip to
that in the video, you can actually see
that it found the geode. And so this is just an awesome,
awesome, awesome thing. It's super, super cool. So how does one build
something like this? And the good news is it takes
about six lines of code. So I have two links for you. One, we have a
Gemini API Cookbook. And this is basically
just a GitHub repo that has a bunch of
examples that you can run with a couple of clicks. And then on ai.google.dev, we've
got really great developer docs that explain things in
more detail for you. So I'm not going
to show you like-- well, this is most of the code. I'm not going to read it to you,
but basically, the Gemini API is a REST API, but we have
SDKs for Python, Node, and a bunch of other
great languages. Here we're installing
the Python SDK. Then you import
google.generativeai, you configure it
with your API key, and you can get an
API key in Google AI Studio with a single
click in the cookbook. There are step by
step instructions that will help you get started. And then it's just
a few lines of code. And this is actually almost
our most complicated code because we're
working with videos. So it's one line of code
to upload your video, just because it's kind of
ridiculous to try and send that whole thing
over an HTTP request. So instead we upload
the video first. There's some pre-processing
that happens in the back end, so we sleep until
the video is ready. Create a model, which
is Gemini 1.5 Pro, and now we have a prompt,
which is summarize the video. And what's really
cool, is we've tried to make it really easy for
you to work with multimedia. So now if you wanted to
do something with the map, basically, it's
the same pattern. Except what you do in
your prompt, is you're passing a list. So there's text, there's
video, and there's a map. And that's basically all the
code you need to get rolling. I also want to quickly
show you how easy it is to work with really long
files, text files, and PDFs. So you can try this example
end to end in the cookbook. It works pretty
well out of the box. Basically, what we're looking
at here, this is a 400-page PDF. And it's a transcript of
the Apollo 11 mission, and that's from the first time
humans landed on the moon. And basically, we're uploading
the PDF to AI Studio. And we write a little prompt,
find four lighthearted moments in this PDF. And what's happening
here is, Gemini has read the PDF, which
again, is 400 pages. And it's finding some
humor in the transcript. And one thing that's
really cool is, I noticed that in this
example, it actually cited the page in the PDF
where it got this example. And then what I did is,
I flipped to the PDF. And you can see that Gemini
found exactly the sentence, and it cited it correctly
where it appeared. So here Michael Collins
is making a joke about munching sandwiches. So you can work with
huge text files. And this has endless
applications. We can do a whole talk on it. Especially for
search, and there's tons of great stuff
in healthcare. And what I wanted to show
you, really, really quickly, is the code basically
looks exactly the same. So again, it's just a few lines. What we're doing is we're
uploading our text file, creating a model,
writing a prompt, and we're just pass a list with
our text and our text file. And of course, you can
include images and audio too. Another example, we have in
the cookbook that you can try. We have a 44-minute
speech from JFK. And this is the 1961
State of the Union. And this works pretty
well out of the box. Here we can do
audio summarization. And there's some cool new
examples in the cookbook too, for things like voice
memos and stuff like that. But anyway, you can upload the
file to Gemini and Google AI Studio, and here we're just
doing a quick summarization to kick it off. And it will summarize
this speech. And again, the code looks,
basically, exactly the same. So create the model,
upload your audio file, and then you can go
ahead and prompt with it. One last example to
show you really quickly, and this is more for developers. So this one is a little
bit more technical. So the idea here is that
you have some Python code and say you're doing code for
something like home automation and maybe you have
something like lightbot. So you can control your lights. You can turn your lights. We have three Python functions. We've got one to turn them on,
we've got one to turn them off, and we have one
to set the color. And in this case,
we intentionally wrote this in a little
bit of a complicated way. So to set the color,
as a parameter, it takes this RGB hex value. And now what's cool is we
want to see if we can call these functions using Gemini. And so what we're
going to do, is we're creating a
list of functions. So before we were
working with media files, but now we're creating
a list of functions. And we're writing a
system instruction to configure the model, to
tell that it's a lighting bot, and there's different things
you can do with light. And now, when we create
our model in code, we're including a list of
the functions and the system instruction. And now, if we send
Gemini a message, if we say, light this
place up, the output is the function that
it would like to call. So it understands the functions
and can map natural language to the function. And what's really
cool, too, if we say something more complicated,
like, make this place purple, Gemini will actually figure
out, not only does it have to call the function
to set the light color, but it figures out the hex
value that it should pass to make these functions purple. And in the Python
SDK, we actually have a parameter you can set
called automatic function calling. And so in addition
to just seeing what Gemini would like to
do, Python will actually execute that function for you. And you can use this for a whole
bunch of awesome home automation stuff for anything you can
imagine working with code. So that's a really quick, rapid
tour of a couple new examples we have. You can find most of
these in the cookbook. You can get started with
just a couple of clicks. The cookbook is
for Python, but we have great SDKS, which
you should check out for Node, Go, Dart,
Flutter, Android and Swift. And yeah, thank you so much for
my really fast talk, listening. And I hope you
have fun with this. Rock and roll. [APPLAUSE] And now, Joana. JOANA CARRASQUEIRA:
Thank you so much. Josh. Well, and as you could see,
with prompt-driven development, AI is literally becoming
more accessible to everyone, regardless of
technical background. And it's been amazing to see how
this technology has been gaining momentum over the years and
how so many of these models, they power some of
the amazing products that we have here at Google. And more models have emerged
very quickly from T5 to Lamda to PaLM, PaLM 2, Gemini and more
recently, the Gemma open models family. So really, [APPLAUSE] It looks like we have
some Gemma models fans. OK, there's a
Gemma talk tomorrow that you have to attend. So like I was saying, when
we think about large models, it's not just about a
single architecture. There are advancements
across software, hardware, and the people behind
these technologies that we really have
to think about when we think about the
holistic development of AI. So when we think about these
advancements, about beyond architecture, there are four
main things that we always have to consider. And they are computational
power, machine learning techniques, training data,
and access to innovation. So we want to empower
every developer to be more productive with AI. We want to take it even further. And we want to
make sure that you are productive in
your day-to-day life by providing you with, just to
mention a few tools for debate, for debugging, code
generate, auto assist, testing vulnerabilities. But really, Gemini, as you could
see during Josh's presentation, is your best partner. And as we saw, Gemini
1.5 Pro, really offers the world's largest
context window and allows you to process
vast amounts of information in one single stream. So as developers
become more productive and as the models evolve,
we are able to unlock amazing new scientific
discoveries and innovations. And that's why I'm really
excited to bring you AlphaFold 3. AlphaFold 3 is a model that
is a combination of innovation and research from Google,
DeepMind and Isomorphic Labs. And this model is really special
because it's a state-of-the-art model that allows you to
predict structures in all life's molecules, from
proteins to DNA to RNA. And we believe that
we're going to unlock so many scientific advancements
in drug development, drug design, but ultimately,
this will bring so, so much positive-- so many positive health outcomes
for patients and communities around the world. AI is also a catalyst
for innovation. And you can really be more
collaborative, as a team, by using AI. You can take it
from a simple idea. You can make it better. You can brainstorm, iterate,
refine it, and really start with something
really simple and make it more complex
and better over time. So there are two tools that
I'm really excited about. One is AI Flutter
Code Generator, which allows you to riff off on
UI designs with your developer and design teams, just from
one single text prompt and Data Agent, which allows non-coders
to also have a better understanding of data, about
the relationships between data, but also have a very deep
understanding how it works. Now, you might be thinking,
OK, things are moving so fast. I'm feeling a lot of
pressure as a developer to really build better,
faster, smarter apps. So how do I transform all these
challenges into opportunities? And this is a mindset
change, and that's why I really want you to
think about the opportunities that you have to build new
cool innovations with AI. So I'm going to break
down these opportunities into two main themes
one about creating models, and another one
about consuming models. So let's start by talking
about creating models. And the key to succeed, in any
complex task, and let's face it, there are not many
tasks that are more complex than
understanding your data, training your model
with that data, and also using that intelligence
to build apps with a very good user experience. But like I was saying, the key
to succeed in a complex task, is really by having a
stack of technologies with clear separation of layers
by functionality and optionality between those layers. So when it comes to
building with AI, we've been working tirelessly. And I can assure you that,
to ensure that you have the right stack to build with. So let's start at
the top, shall we? If you are a developer,
in the age of AI, there are probably
three main things that you're trying to
do with your models. First one, you might be trying
to create a model from scratch. And you will need
a consistent API if you want something that
is solid, easy to learn, but also easy to
maintain, whether it's a deep neural network
or a wide LSTM. Maybe you'll be training your
model using parts of an existing model, and that can
save you a lot of time and help you build
a better model. Or what's becoming more
common, maybe you're trying to fine tune a
generative model with techniques like [? Lorem. ?] And
in just about a moment, Laurence is going to
do a demo for you. However, regardless
of what you're trying to use, your model for,
Keras is your best friend. But your job really
only starts at coding. And once you've defined your
neural network architecture, and once you've chosen
the layers to fine tune, something still has to
do the heavy lifting of machine learning. And optionality is
really important here, and that's why you've been
listening about optionality so much today. But maybe you have access
to state-of-the-art hardware infrastructure, or maybe you're
sharing a GPU with your team. What is really important, is
that you optimize your hardware. So that's where Keras,
with optional backends like TensorFlow, Jax, and
Pytorch, is really important and will make your
life a lot easier. We're all familiar and
we all love TensorFlow. We have a very big
TensorFlow community. Developers still
use it quite a lot. But I also want to call
out Jax because Jax is a framework for
accelerated computing. And when combined with
Keras on the top end, you can really take advantage
of this acceleration without having to
change your code. So whether you're in a
compute-rich environment or not, you can experiment with it
and see if it can really train it faster with Jax. And of course, you will need
a place to run your models. We need to put it
in people's hands. So with that in
mind, the ecosystem of runtimes that we've
been working on for years, will work with your
models on this stack, no matter the backend. Everything from the smallest
microcontroller, to mobile, to the browser, to web servers,
to accelerated infrastructure has the ability to
execute the models that you train on this stack. And this really provides an
amazing level of optionality. So investing your
coding time in Keras, it really opens so many
opportunities for you as a developer. Lastly, I just wanted to
mention PaliGemma is here and is designed for world-class,
fine-tuned performance on a wide range of
vision-language tasks. We are so excited
about what you're going to build with
the Gemma open family. And so we look forward
to seeing what you are going to create with it. But like I promised, Laurence
is going to join us on stage. And he's going to demonstrate
what I've just talked about. LAURENCE MORONEY:
Thanks, Joanna. [APPLAUSE] Hello, everybody. I'm going to first explain
a little bit about what I'm going to do. Now, how many of us here in
this room, out of interest, do startups, or you're
forming your own startup, all of this kind of thing. Great, it's maybe 40%,
as far as I can see. Now, think about when you
are starting a startup, and you're creating
this whole new thing. And a lot of the
time, you really want to just test
out your ideas, you want to validate your ideas. You want to get it
in front of people so that when you go to
investors with a pitch, you've really tested
and validated that. And that's one of
the areas that, with generative AI, where
I'm particularly excited. Because so many ideas
that are out there, you can just start kind
of kicking the tires on these ideas using synthetic
data that has been created by something like Gemini. And then you can
start building an app and start building an
app idea around that. And if you were at the
developer keynote earlier on, Sharbani and I were kidding
around about one of the ideas we had. And that was, well,
hey, we have kids. Our kids love to read
books, but sometimes it's very difficult for us to find
the right books for children to read. You can go on to the
sites where you buy them and maybe read reviews. You can go on to
sites like Goodreads and maybe read reviews. But that's a massive
cognitive load that you have. And with that cognitive load, I
mean, it just makes it harder. Wouldn't it be nice if
you could go to a chatbot and just ask a chatbot about it? So then if we were a startup
like publishing books or providing an
app like this, hey, that would be really cool to
do that to allow people to have a chat application where they
could ask about the books, and then maybe use Rag with
details about their own family, that they could pass
through this application, keeping it all
private and on device, so they could get even
more intelligent reasoning around that. And we thought, that'd be an
amazing idea for a startup. But how would somebody
get started with that? You need data. So I just got signed out of my
laptop, so let me sign back in. And if we can switch to
the laptop screen, please. So then the idea is, just
with something like Gemini-- let me zoom in on that, so
you can see it a little better --I could start
just doing things like I put in a simple
prompt in AI Studio, where I asked it for--
oh, here's me just saying, give me 100 [INAUDIBLE]. Like, please help me create
a synthetic data set. Output it as a CSV
containing details on family-friendly books. Output the title, the genre, the
theme, and a simple synopsis. Create about 100 books. And then what Gemini
did for me was, it started creating books like
"The Big Friendly Giants," "The Paper Bag Princess"
and stuff like that. So now I'm actually
starting to put together a catalog of books
using synthetic data, and AI Studio just allowed me
to do this as a simple prompt and get that as a CSV. So now I have a starting basis. I have a database of books
that I can start working with, but it gave me a
very simple synopsis. For example, I don't
know, "Tiki Trouble," then I'm just reading here was
a mischievous imp causes chaos for a young girl named Ella. I can't really tell a lot
about that book from that. So maybe I could
go back to Gemini and have Gemini create
detailed synopses of the books. So that's what I
ended up doing using the generative AI that Josh
was showing earlier on. And I'm just going to
show it very briefly here. But what I could do is-- and I
can share this Colab with you later if you like. But what I could do is, using
the generative AI model-- sorry, the generative
AI API that Josh was showing earlier
on, I could then create a new prompt
that's like, please write a detailed synopsis
of a plot for a book called something. When I'm reading
through that CSV, add the title, in the genre
of add the genre, which has a theme of add the theme. And this basic plot,
add the synopsis. And now Gemini is going to write
200-word summaries of the books for me. So now I'm getting
lots and lots of data around these synthetic books. So now as a startup who's
looking to get into this area, I now have all of this
data that I can work from. I didn't need to buy a database. And so I can start proving
out this concept and building something, again, that I
could bring to investors. Now I have the data,
what do I do with it? Well, tell Lora we
love her, So it's with Lora, low-rank
adaptation, we can now take a model like Gemma. And we can start fine tuning
Gemma with instruction tuning on that data. And I'm going to
just show, like-- any Keras TensorFlow
developers in the house? OK, quite a few of you. So all of those skills that
you've had with using that, it's just the same thing that
you're going to be doing here. So when I come down to this part
where I'm fine tuning Gemma, hopefully, my VM is still
running, I'm running it. And we'll see right here,
is a good old-fashioned-- If I zoom in, we'll see a
good old-fashioned model.fit. So all I've done, is I've taken
the data, my synthetic data. I did a little bit of
code that turns that into instruction tune format. If you're not familiar
with instruction tuning, the idea is that when you're
working with a large language model, like a Gemini, you just
give it the data in the way that you would expect it
to give you back data. So if you give it a prompt,
and you expect it back in a certain way, then
you just create that, and you create lots
of instances of that. For example, the
prompt could be, is this book suitable
for a 10 year old? And the answer is, yes, it is
suitable for a 10-year-old. You put those prompts together
for those specific books, create all of that data, which
I've done synthetically here, and then train a model on that. And we can see here
now it's, like, it's going to take a
little while to train. The first epoch was done
in about 43 seconds. The subsequent epochs in Keras
are usually a bit faster, but the 20 of these is probably
going to take about 10, 15 minutes. So while that's training,
and the learning leprechauns are doing their thing, can
we switch back to the slides, please? So we've been
learning, and we've been hearing about the
word AI all day today. I'm trying to avoid
using the word AI because everybody
else uses it a lot. But we've been hearing
a lot about models that you can create today. But one of the biggest
things that developers, when we speak to them,
get confused about, is where do I use
my models, which ones do I do, where do I put
them, do I use hosted Gemini, do I fine tune my own
model, do I use something like an open source like
lama, where do I start? This is all really confusing. So I always like to think about
it in the terms of the number three. So if you think about where
your models are going to execute is a spectrum. From one end of
the spectrum, I'm going to be talking in
detail about in a moment, but that is a model
that you completely own that's on your own device. Be that a server, be that a
mobile, be that a desktop. At the other end
of the spectrum, is a model that you do not own
and is hosted by somebody else. And that's like a Gemini
being hosted by Google. That's like a GPT
being hosted by OpenAI. And you access that via an API. Increasingly
important, and you're going to be seeing
this as a major growth area over the next 12
months, is in the middle of this, where it's a model
that you own, that you create, that you fine tune,
that then gets hosted, potentially, by
somebody else on your behalf. It could be hosted
by Google and Vertex, and there's a whole crop
of startups starting up to actually do that
hosting for you. I think, personally,
this is going to be the most exciting
area for us as developers because the idea is, if we can
start training our own models or fine tuning existing
open models to be able to solve our
business problems, now we have a place
where we can host them. So today, you're seeing
a lot about Gemini being hosted in the Cloud for you. You're seeing a lot
about Google AI Ads that I'm going to be
talking about in a moment. But as a developer, also think
about investing your skills in that area and
think about what it takes to fine tune models. And as we announced in
the keynote this morning, Gemma 2, part of
the idea of that is it's going to be a 27 to
30 billion parameter model. But it's going to
fit on a single TPU. And I think that's one of
the important parts of it. So now if you're going to be
hosting this for your business, that single TPU, the
cost of that single TPU to be able to host a model that
you have fine tuned explicitly for your use case, then becomes
a really powerful thing. But I'm going to switch to
the mobile side of things for a moment. And like the whole idea
is that when you run want to run a model on
a device, It's not easy. So devices are
generally lower powered. You don't have all
of the hardware that you have in a data center. And all of that kind of stuff. So typically, and
for years, many of us have been doing this using a
technology called TensorFlow Lite. You build your
model, you convert it into the TensorFlow
Lite Flatbuffer, and then you run it
on TensorFlow Lite. In the last couple of years,
at I/O and other events, we've been talking a lot
about the MediaPipe Framework and MediaPipe Tasks. The idea behind those two, if
I show them on the next slide, is really they abstract a lot
of the complexity of having a model on your device. Now, if you're an Android
developer or an iOS developer or a web developer,
you probably think in terms of strings
and bitmaps and jpegs. If you're an AI
developer, you generally tend to think in
terms of tensors. Models are tensors
in and tensors out, and it becomes a difficult
task, sometimes, for you, if you are not a
TensorFlow developer, and that's where the name
TensorFlow came from, if you're not a
TensorFlow developer to start thinking about
your data structures, how your app does things, and
to be able to take advantage of a trained model, to be able
to make your app artificially intelligent. And you end up with complex
workflows like these ones. The one on the left is
just a single model. The ones on the right is
for many, many scenarios like segmentation, object
detection, that kind of thing. You've got multiple models. And you end up having to
write all of this code for doing things like
preprocessing, post-processing, on each of these models
and managing it all. And it's really,
really difficult. And that was the idea of what
MediaPipe and the MediaPipe Framework were giving you. The idea was that we abstract
a lot of that complexity. You write your code to just
deal in strings and bitmaps and whatnot. But we did get a lot of
feedback from the community that it's still very confusing. When do I use TensorFlow Lite? When do I use MediaPipe? When do I use Gemini? When do I use Gemma? All of these kind
of things, and we understand that it's confusing. So, as a mobile developer,
if you're using Gemini, you're going to be using an
API to call the hosted Gemini, or you could be
using an API to call a Gemma model that runs
on device, that I'll show in a moment. Or you could be
creating your own model and running on TensorFlow Lite. Or you could be
creating your own model and running on MediaPipe. And there's all of
this kind of confusion. So what the folks have
been doing working hard, is to create this
overall umbrella that we call Google AI Edge. So you'll have a one-stop shop
for all of the innovations that are happening
on the Edge in mobile to try and reduce the confusion,
particularly around getting started in that stuff. And as part of that, there's
a few new things that have actually been released. LLMs on device, which I'm
going to show in a moment with my fine-tuned Gemma,
where the idea was, again, all of that complexity. Because when you start
dealing with an LLM, it's not as simple as
you give it a prompt, and it gives you an answer. Your prompt needs
to be tokenized. The answers coming out of the
model are generally serialized, and then they need
to be detokenized. And there's a lot of work that
needs to be done to do that. But the MediaPipe folks have
actually built wrappers for LLMs like Gemma to make
that work a lot easier. I'm going to show
that in a moment. We've heard so much from the
community about PyTorch support. So now if you build
models in PyTorch, they can be converted into
the Flatbuffer format that will run on Android or
iOS or even the web using our TensorFlow Lite runtime. And then finally,
something you'll be seeing a lot of in
the AI Edge session, is something called
the Model Explorer. And one of the things we
always hear from developers is when you get a model, and
the model is doing something, and it's doing some
kind of reasoning, or it's doing some
kind of an inference, how can I trace what it's doing? How do I understand what
it is in that model that gives me this answer
instead of this answer? And that's the idea
behind the Model Explorer. It's freshly baked. It's hot off the presses. It was showing it to the public
for the first time today. You should be seeing it
in some of the demos. And it's going to be available
under Google AI Edge. And I hope it solves
many of your use cases when it comes to understanding
and communicating how your models are
actually going to work. So I'm going to switch
back to mobile for a moment if we can go back
to the demo machine. And I cheated a little bit. I have a version of the
model that I created earlier because we're short on time. And I'm just going to
run it on my phone. It's a live demo on my phone. So here it's now loading--
oops, the app actually crashed, so let me do that again. So I'm just going
to load up the app. And you'll see the app is just a
very basic chat-type app, where what I've done is I've
fine tuned a Gemma model on to become an expert system
in children's books. And now we can see. And I'm sorry if
it's a little small. And now I can just go to it. Now, for example, would "The
Time-Traveling Catacorn," which is a book that we made
up completely for this demo, but I think I'm
going to write it, be a suitable book
for my family? And. If you look carefully--
oh, It crashed, sorry. Oh, you gotta love live demos. So let me try that again. Good thing Sundar is not here. [LAUGHTER] Would "The
Time-Traveling Catacorn" be a good book for my family? OK, everybody,
cross your fingers. There we go, OK. So if you look carefully,
you'll see the prompt. I have a system
prompt where I have the instruction
where it's saying, below is an instruction
that describes the task. You're a helpful bot. You want to understand
family books. The Rag part of
it is the context, where I have data
that's on my device that I don't want anybody
else to know about. And it could be personal
details about this is what my daughter
likes or my son likes. In this case, I just said my
family really enjoys books with interesting plots. And then, finally,
is the actual prompt, Would "The Time-Traveling
Catacorn" be good for my family? And it looks like the model
actually froze in its output. Trust me, it does work. But the idea here is that this-- Gemma will give me that output. Let me try again. I'll just say, tell me more. Tell me more, tell me more. I'm sorry. You know what? I'm going to skip
back to the slides. I'll demo this later. Sorry. It's the kind of thing that
works always in dry run, but then when you do
it on stage, it fails. Can we go back to
the slides, please? OK, so to wrap up,
there's a few things that I do want to
reemphasize here. First of all, AI
is human centric. I hear a lot of
feedback from people saying that AI is going to take
away our jobs as developers. When we see code
generation, for example, it's going to take
away our jobs. I'm just going to say,
completely untrue. This is a greater
opportunity for developers than there ever has been. It's a greater opportunity
for any kind of creators than there ever has been. The ability for you to be
able to expand your horizons, as a developer, is
greater now than ever. I really think there has
never been a better time. One of the next trends
that you're going to see is agentic workflows. I promise my friends I
wouldn't say the word agentic, but I just said it. But what you're going to start
seeing, in the next 12 months, as a developer, is
the ability for you to create apps that use
LLMs that self-correct, that self-identify,
and are able to start interacting with humans or with
each other as part of workflows. So this is one of the
areas that I would strongly recommend to skill
up in to start looking at using things the
Gemini API to be able to build these types of things. I don't know if you saw the
keynote this morning where they demonstrated Chip, where
chip was the idea was, it's a helpful assistant that
works within your Google Docs and works within
your Google Chat. And I you find this is a great
example of an agentic workflow. Another thing that we recommend
is to join the Kaggle community. As you create your own
fine-tuned Gemma models, hopefully ones that
are better than my one, you're actually going to be able
to publish them to the Kaggle community, and to share
them with other people so that they can
use them themselves. And there's just lots of great
stuff happening in that space. Finally, the last one I would
like to say is AI is accessible. We've been working really hard
to widen access to everybody, to make AI as easy as
possible so that everybody can jump on the AI train and
be able to build better apps and be able to build better
sites and services with it. Also, just with the
last minute I have, just to encourage everybody to
remember that responsibility in AI is very, very important. It's a very difficult
thing to do. One of the things we've
been stressing at Google, is to try to be
a thought leader. We're responsible AI by
publishing responsible AI principles, and
I'd encourage you to read them and follow
them, or adapt them, really, for your own use. And then, finally, I'd
just like to leave you. That question is, there's
the opportunity like never before to change the world
with your work as a developer, not with AI, like
this slide says, it's with your work
as a developer. There are
opportunities out there to make the world a better
place for other people. I encourage you
to go and take it and just want to say
thank you so much. [APPLAUSE] [MUSIC PLAYING]