[MUSIC PLAYING] MARK MCDONALD: Hi, everybody. Welcome to our workshop. AUDIENCE: Woo! MARK MCDONALD: Woo! Thank you. Love that energy. [APPLAUSE] Nice. After lunch, too. I would be in a coma. So my name is Mark,
and this is Wei. We're on the AI DevRel or
Gemini DevRel teams at Google. Joined here with us
today with our TA, Kara. She's on the
Workspace DevRel team. This is a recorded
workshop, so you won't be able to ask questions
interactively as we go. But if you do need a hand,
you can raise your hand up, and Kara will be able
to come and help out. So the format today is
a Codelab that we've written up for this talk. And if you load this
up on your machines, you'll be able to follow along. The most important part here
is being able to copy and paste all of the code. There are some blocks where
we kind of move through fairly quickly 10 or 20 lines. And so if you can just copy and
paste, you'll be able to stay-- keep the pace. Or if we're going
too fast or too slow, you can move ahead or
behind as you need to. Once no more phones up-- so I think everybody's
probably familiar with what Gemini is by this point
of the conference. It's our family of large
language models and is-- multi-modal large language
models, I should say. And today, we're going
to be using the Gemini API to work with Workspace
tasks in Apps Script to automate some
common workflows. So for today's
talk, the main thing we need to know about
Gemini is that at a sort of functional level, the model
works even below the API. It works as a next
token predictor. So you give it some text that's
encoded as tokens in a sequence, and it will give you
probabilities on what the next token should be. So you can exploit
this functionality to do things like
put in a question and have it kind of autocomplete
the answer to the question, or put in documents and tasks
to summarize the document. You can brainstorm ideas. And these models are
multi-modal as well. So as well as just
using text, we can put in image
data, audio, video and perform the
same kind of tasks. So today, we'll be analyzing
some charts in a spreadsheet, but you can also use the
multimodal capabilities to do things like take photos of
things in the real world and map them into spreadsheets
or documents if you need to. So for today, the
format is going to be I'll show you a couple
of things in AI Studio, and we'll test out our key. We'll move across
and start writing some code in Apps
Script that will start to connect to Gemini
and to the Gemini API. And then we'll start
building some tools on top of that to do some more
advanced things within Apps Script and Workspace. And so by the end
of this session, we'll have a fully automated
chat bot, or agent, you could say, that takes
open-ended user text and performs some task in
what appears to the user to be like a single,
fully automated step. So the first place we want
to start is AI Studio. Open this up now. We're going to grab an API
key as the first thing. What you would normally do when
working with the Gemini API is start with AI Studio. In AI Studio, which is the
web interface to the API, you can practice writing
prompts, explore different prompts, and see how wording
changes affect the output. You can save your
prompts to build up a library of different
tasks, explore different models and
different model settings, and see how they work and what
works for your applications. So here's one we
prepared earlier. So if you open up AI Studio--
if this is your first time, you'll see a welcome dialog. You'll need to
agree to some terms. I've already done this,
so I'm good to go. When you're prompted or
if you're on this screen, click get API key. That's fine. And you'll be able to create
an API key on this step. So we hit Create API key. If you already have a
Google Cloud project setup, you'll be able to choose
one from this list here. If this is your first time with
this account creating a Google Cloud project, you'll have
another button just here where you can select create a new
API key in a new project. So either way, create a key. Should only take a second. You can hit Copy. Keep this tab open
because we'll need to come back here a couple of times. But for now, we'll get
on to testing our key. So I'm going to switch
over to a command line here and use a
program called curl. If you have a command line,
and you have curl already, and you know how to use it,
feel free to follow along. All I'm doing here is
just testing out the key. So this is just a test just
to make sure everything works. If you aren't familiar with
curl, or don't have it set up, don't go out of your
way to install it because it'll take too long. So what we're going to do is
set up our key in an environment variable, and paste it in
here, and then just copy this command here. What this does is makes an
HTTP request to our Gemini API endpoint-- specifically, the
/models endpoint. And when we make a
GET request here, this will just list all of the
models that are available to us with our API key. Presenter-- and that
was super quick. We see we get a
JSON response back, and it contains
a list of objects that describe all of the
models that are available. So we've got things like
Gemini 1.0 Pro Vision. We get a display name, we get a
detailed description, as well as some information
about the features that the model supports. So a number of tokens in,
temperature, and other default parameters. So if you see this,
it means it worked. But in practice,
what we want to do is start making
an actual request to generate some content. So if we scroll down a
little bit in the Codelab, you'll see a request
here on the left. This is what a generate
content request looks like. So you can see there
are a few parts here. The first item in the
object is content. So this is a list of
turns in a conversation. So in a simple text in,
text out kind of situation like we're doing here, there's
only one entry in this list. It's just a single turn. If you have a chat conversation
or a multi-turn conversation, this is where you would--
you would have each turn as an entry in this list. Then inside here, we have parts. So each of these parts is like
a chunk in the conversation. Here, we just have
a simple text part. When you're adding multimedia
to your conversations, to your prompts, you would
add extra entries in here. So you'd have a text, you'd
have images, you'd have audio, and any other entries in there. So because this is
more than one line, we're going to just put
it all into a text file. So I like to use cat because
it's pretty straightforward-- presentation.txt. Paste that in, and that's done. So now, we're going to
make a similar request. So this time, we're using
curl to make a JSON request. It's a POST. And we're using this file we
just created, presentation.txt. Now, this is the same
endpoint as before, except this time
we're specifying models/gemini-1.0-pro-latest. So for this Codelab, we're
using the 1.0 series of models. We have a whole bunch of 1.5
models that are available and Flash that was
announced yesterday. 1.0 is still very capable
and has really generous quota and usage tiers. So we're going to use that
today just because it's simple, it's quick, and
it's cheap to use. So on this model,
we're going to call generate content and,
obviously, specify our API key. So let's press enter
and see what happens. OK, so we've got
a response back. Here, you can see
in the response, we've got a list of candidates,
contents, and parts. And then the text is the actual
response to what we put in. So we've got a markdown
formatted string here. Looks great. We get some extra fields
back in the response. So here, we get a finished
reason, for example. This will tell us
whether it just stopped generating
because it had finished or whether there was something
else that happened here. So if we hit a safety filter,
it would specify here. If we hit-- like we ran
out of tokens compared to the number that
we had requested, that would be specified here. We get some extra
information back, like the safety classification
for our response. So if something was
a little bit spicy, it would be captured
here, and you'd be able to see why that was. And we get some information
about the tokens that we used in generating
this request and response pair. So this is useful when you want
to track how much you're using, especially if you
have billing enabled and you're paying
for each request. But for a test, this is done. We know this works. If you want to see what the
response looked like formatted in rich text, that's it here. But now, we're going to
switch across to Apps Script. Copy this again. Now, to start editing
in Apps Script, there's a handy shortcut URL. You can just type in
script.new and load it up, and this will load up the
web editor for Apps Script. So here, we get a file
that's been created. We've got a little function
stub where we can start writing. But what we want to do
is we're going to start writing up a utility library. So the first thing
we're going to do is-- the first thing we're going
to do is name our project. So we'll go IO 2024. And then we're going to rename
the file from code to utils. And we're going to put
our own code in here, so we'll just delete
the entry in here. Make sure that's visible. I can make this a
little bit bigger. Yeah, that's good. So the first thing we
want to do in the code is to specify our API
key and our endpoint. We don't want to put the
key in our code, though. We want to be able to ship
our code to colleagues. We want to be able to pass
the source code around without having to worry
about our key being used amongst everybody, or it
being shared publicly, or anything like that. It's a secret, and we
want to keep it as such. And so the way we do
this in Apps Script is through project properties. So in here, in Project Settings
on the left with the cog, and down the very bottom,
we have Script Properties. So we can click Add script
property, type in Google API key just like before, and paste
in-- oops, that's not the key. Refresh here, copy,
paste that in, and save. And now, we've got a key
specified in the properties, and we can get to
these through the code. So we'll copy across
our first code snippet. Paste that in, and I'll show
you what we've got here. So these first two lines
load up the property service and pull out this key
that we just specified. So if a colleague has
a copy, and they've updated to their own key,
this will pull theirs out. And then all we're doing
is specifying the endpoint that we're going to use here. This is gemini-1.0-pro-latest,
and we plug our API key in here. So we'll use that in
our request coming up. Now, the first thing we
want to do for this project is to set up a function
to call Gemini itself. I'm going to copy and paste
this function, callGemini. And this takes two arguments. We've got a prompt, like a
text input, that comes in, and we can optionally
specify a temperature. The temperature
defines how creative or how variable
the output can be. So a temperature of 0 is going
to be consistent and give you the same results every
time, but higher numbers will make it more variable. So this payload is
exactly the same as what we just saw with curl. We've got the contents,
parts and text. We just wire up the prompt from
the function into this object. And the same with
the temperature. We specify that in
the generation config. As with curl, we have POST,
JSON, and we put our payload in. And then we use
the UrlFetchApp-- it's part of Apps Script--
to make that HTTP request. And as a bit of a
shortcut helper, we have this content
that are returned that just takes the first
candidate, the first part and the text field, and
returns that to the user. This makes our callGemini
function like a simple text in, text out direct function. Now, to test this
in Apps Script, we need a function that
doesn't have any arguments. So we paste in this
test Gemini function. This takes nothing. We've just kind of hard
coded the prompt in here, but this will allow us to
test it in the UI directly. So with the prompt-- the best thing about
sliced bread is-- and then we call Gemini and log
both the prompt and the output. So to test this, we
want to hit Save. And now, we've got the function
available in this dropdown here, your testGemini, and press run. Now, because we have
started connecting to an external service-- this is the UrlFetchApp-- we will need to authorize. So any time you add new
permissions or external services that might access the
internet or might access private data within
Workspace, your app needs permission to do that. So you need to
authorize it, so that's what we're going to do here. And we'll need to do it each
time we add a new service as we go through this. We see this kind
of scary warning screen because this
is just in dev mode, and we haven't set up OAuth
yet to get rid of this. We can set up OAuth
in the Cloud Console. There's an extra setup
flow you can go through. But when you publish your app
into the Workspace Marketplace, that'll take care
of this as well. So because we're
just in dev mode, and we're writing
this code ourself, we can just click to proceed. We trust ourselves, right? And now it'll run. So you can see the
response is the best thing since sliced
bread is pocketed bread. Sure. So that was text. We're going to try images now. So with the Gemini 1.0
models, images and-- sorry, Vision use a
different endpoint. So we're going to go
up the top of the file and add in this Gemini
Pro Vision endpoint. Same as the previous one,
except we say Pro Vision. And to use that, we're going
to build a Gemini Pro Vision utility function. Scroll, copy, drop that in. So this is very similar
to the previous function. We've got a callGeminiProVision. We give it a prompt and
temperature like before, but now we're also passing
in an image object. And this will form
part of the prompt. Now, in order to pass an
image to the Gemini API, it's a text HTTP
request, so we need to encode it using
Base64 encoding. This works perfectly fine
for requests that are up to about four meg in size. And when you hit that limit,
you can still go over it. The API supports higher
numbers than that. You just need to
use a files API. You upload your
content first, so for video, or for longer audio
files, or numerous images. And then you can use
them in the prompt. But for everything we're
doing today, single images, it all fits in, and
this is the simplest. So you can see here, we
have parts, text part, and this is our image part. So everything else
is the same here. The JSON request that we post
and return the text field from the first candidate. So that should do images. Let's test it out. So here, we have a prompt-- provide a fun fact
about this object. And then here, we're using
the UrlFetchApp to download this image, and then we'll
encode it and send it in the request. So let's take a look
at what this is. It's a pipe organ. This should be fun. So we call the Gemini
Pro Vision and then log the input and output. Let's save it,
testGeminiVision, and press Run. So here, it's
downloading the image and then it's encoding it
and sending it to Gemini. So here we go. Provide a fun fact
about the object-- largest pipe organ in the world
is located in Atlantic City. Wow, very fun. Cool. So that's text and images. The next kind of power feature
we're going to add in here is using Gemini with tools. So specifying tools allows
us to provide access to APIs or functions
that we have available on our
side of the system. And we tell the
model about them. And if it needs to
use these functions, it's able to-- instead
of returning text to you, it'll say, hey, call out
this function for me. You call it, return
the response, and then the model
will continue on with the conversation or
request that you initially had. So this allows you to
do things like calling your own private internal APIs. You don't need to
expose them to Google. You can do it
indirectly yourself. You can connect
this to public APIs, or you don't even need to
call a function at all. You can use this to generate
kind of structured data from the API. Let's plug it in. We'll take the
callGeminiWithTools function, paste that in. Same pattern as
everything else so far. We've got the prompt
and temperature. This time, we're
specifying a tools object. So this goes into the
top level of the request. So at any point in
the conversation, the model can kind of
call out to these tools. Everything else is the same. We use the text endpoint. Except down here, instead
of returning the text field from the conversation--
from the returned part, we return this
functionCall object. So this defines the
function that the model has asked us to call, and then
we'll return it and continue on. To give you an idea of
what this looks like, the test function
has one built in. So with this prompt,
the request is to tell me how many days
there are left in this month. So this is something that the
model can't answer immediately without other information. So the large language
models are trained at a specific point in time,
and they have a knowledge up to a point in time, and
they don't really know or learn anything beyond that
unless we give them kind of extra explicit information. So that's what
function calling can help us do-- what these
tools can help us do. And here, we're passing in
a function called datetime. And so this just returns the
current date and time, like now, as a formatted string. So hopefully, the
model will say, OK, I need to call this before
I can answer your question and continue on. So this is the
OpenAPI specification. So we've given it name,
description, and parameters. This one only has a return
type that's a string. But if there were
other arguments that the model needed to
provide when calling it, they would be
specified here as well. Much like writing code,
I'm sure you guys all write great docstrings
with your code. It's really important that we do
this with large language models and tool calling because
this text description is how the model uses-- how it decides, I guess to
actually invoke this function. So here, we'll just
call that, and we'll log the output like before. Let's save. Run, choose, and run. And you can see here
this is our prompt, and it's returned back an
object saying call datetime. So in a practical
situation-- and in a second, you'll do this-- you would then call the function
and provide the result back and continue on. This is just a test. This is proving that it works. So we'll connect the real
function in a second. And I think with
that, I'll hand over to Wei to help you through that. WEI WEI: All right thanks, Mark. So now you understand how
function calling works. And then we can actually
integrate the Gemini API against different products. So in the rest of
this workshop, we're going to build three
integrations against the Google Workspace products, including
Google Slides, Gmail, Google Drive, and so on. So here's a high-level diagram. If you scroll down in the
Codelab into section seven, then here's the
diagram that describes the high-level architecture
of what we're working on. So at the end of
this workshop, we're going to have a system
that takes in a user query. This user query could
be in plain English. For example, set up a meeting at
9:00 AM tomorrow with someone. And then we're going to have
this little routing mechanism or dispatching mechanism which
uses function calling, which you just learned about, and
then dispatch that user query to three separate tools. The first tool is to set up
a meeting automatically using the Gemini API,
and the second tool will draft an email
automatically in Gmail. And then lastly, the third one
will create a skeleton deck automatically. So that's the high-level ideas. And to make this workshop even
more interesting for you guys, within each tool, we're going to
make a second Gemini API call. So for example,
in the first tool, we are going to call the Gemini
1.0 Pro model to summarize a blog and then put that summary
into a meeting description. And in the second
use case, we're going to use the
Gemini Pro Vision API to analyze a chart
embedded inside a spreadsheet. And then we're going
to ask the Gemini model to compose an email based
on its analysis of the chart. And then lastly,
we're going to use the Gemini API to help us
brainstorm ideas about a given topic. And then we're going
to put all those bullet points-- all those talking
points into a skeleton deck. So if you think
about it, this is more like chaining two
Gemini calls in a sequence. First, we call function
calling, and then second, depending on which tool we use,
then we'll make a second call. And in terms of coding,
specifically, for each tool, we need to do three things. We need to have an if-else block
as our dispatching mechanism. And we'll dispatch the user
query to one of those tools and based on the
function calling. Second, we need to
actually write code to implement the
functionality of each tool. And then lastly, when we
invoke function calling, we have to declare the list
of tools with the Gemini model so that it knows the
existence of the tools and can make the right return. So that's the quick description
of what we're going to build. And now, let's move
into actual coding. So the first use
case, as I mentioned, is to use plain
English, and then that will use our system to
automatically set up a meeting. And then while setting
up the meeting, the system will automatically
summarize a blog and then attach the summary
into our meeting description. So in order to
demonstrate that, go ahead and click this download. This is a text file link. So this is the text copy of
the Gemini 1.5 launch blog. It has a lot of
text, so we don't want to put everything in your
meeting description, which is why we're going to use
Gemini API to summarize it. So go ahead and click
Download, and you will go ahead and download that. Once that's downloaded
on your computer, switch to your Google
Drive, and then either drag that file into one
of your folders in Google Drive, or you can click New here
and then choose file upload. It will ask you to pick the file
and then finish the uploading. So I already have the
file in my folder, so I won't be doing that. But in your case, you
should finish the upload before moving forward. So that's the first step. We have now a
sample-- a text file. And then the next step is to
implement our dispatching logic. So go ahead and copy the code in
step number three in section 8. Now, we can switch back
to our code editor. So we are going to create
a new file called-- let's call it main. And this time, we move
the boilerplate code and paste the main file-- paste the main
function in there. And as you can see, there
are a lot of different codes. The first one-- the first line
is basically a mock user query. So we're going to use this
mock user query as a test. We're asking our system to
set up a meeting at 10:00 AM tomorrow with Helen to discuss
the news in the Gemini 1.5 blog file. So that's our mock user query. And we then will use our Gemini
API, the function calling tool-- feature, sorry. The function calling feature
to dispatch the query. So in this case, we're sending
this mock user query along with a list of Workspace tools,
which we'll define in a minute. And lastly, we have this if-else
block as our dispatching logic. If the function
calling response asks to call this
setupMeeting tool then we'll go ahead and call this
setupMeeting function, which takes in the meeting time,
the recipient or participant, and also a file name, which
is our blog file name. So that's our dispatching logic. And now, we have this
code there already. Now, the second
step is to implement the actual functionality
of our setupMeeting tool. Before doing that, we need
to set up the Apps Script configuration. So we need to add a new advanced
Google Calendar service. So go ahead and click this
little plus sign by Services tab and find Google
Calendar in the list. And it's right there. And then you can add that. Very straightforward. This is needed
because we're going to use some advanced
functionality for Google Calendar API. Now, moving on to step number
five, copy over this code. Change over to utils
file and scroll to the bottom of the file. This is our actual tool
functionality implementation. And as you can
see, this function takes in meeting time,
recipients, and file name. So first, we're going
to use Google Drive API to find the text file you
just uploaded into your Google drive. And then we'll
extract all the text from that file
using this function. And then next, we're going to
ask the Gemini API to summarize all the text in that file using
this really simple prompt. Basically, we're
asking the model just to give a summary and the title. But there's one
thing here, which is we're prompting the model
to return a JSON to us. This is needed because
we're using Gemini 1.0. If you use newer models-- for example, Gemini 1.5 Pro-- then there's a new
JSON mode you can use, which will directly
return a JSON, and you don't have
to explicitly do this kind of prompting anymore. Once we get the response
back, we're going to do some post-processing, and then
we'll extract the title and the summary from
the Gemini function-- the Gemini return. And now, we have the
title of the file and also the
summary of the file. Then it will go ahead and
create the calendar event in my Google Calendar. But if you haven't used the
Google Calendar as much, so here's a quick UI demo. So this is how it
looks like in the UI. And there are a lot of
time slots you can choose. You can pretty much
pick any time you want, and then you can change
the meeting title. You can add guests
or participants. And you can add a more
detailed description. You can attach a file
to your meeting as well. So usually, you would type
everything up manually, and then it will be set up
once you click hit Save. In this case, we're
going to use the API to do all of that for us. In this case, we're
calling Calendar app, and then ask the Calendar
app to create a event. In this case, it's a meeting. And then we're going to say,
meet somebody at what time and to discuss a certain topic. And the Google Calendar app
will go ahead and do that. And next, we're going to change
the meeting description using the file summary returned
from the Gemini API. And then lastly, we're going
to call attachFileToMeeting, basically, to just attach the
text file in your Google Drive to that meeting. And this function is
actually implemented, if you scroll up a
little bit right here. So this function, we won't
go into the details of all these API calls, but
basically, it just attached the file to the meeting
using advanced Google Calendar service, which we just
enabled a few minutes ago. And also this
function was actually implemented by our TA,
Kara, so thanks for that. So that's the second
step to implement that. And one last thing. Remember, I said
three things we need to do to implement
the tool, which is to declare a list
of available tools when we invoke function calling. So go ahead and copy the
code block in step number 6. And then scroll up in
your utils file to the top and then paste that in. So let's look at this code. Basically, it just defines a
list of functions or tools. In this case, our first
tool is called setupMeeting. And here's a really
short description. And it takes in
three parameters-- a meeting time, recipients,
and also file name. And all of them are required. So that's step number 3, and
we have finished all of them. Now, we're ready
to run the test. Go ahead and switch back to
main.gs file and then click Run. It will ask me to authorize
the permission again. Go ahead and do that and allow
access to my Gmail account. So it will go ahead and trigger
two separate Gemini API calls. It will take a few seconds
before it comes back, so let's give it a few seconds. So in the log, you can see we
printed out a message that says, your meeting has been set up. So at this point, you can
switch to your Google Calendar. And as you can see,
there's a meeting set up on my Google Calendar. If you click that, you can
see we have a title there. We have a detailed
description, which is a summary of the blog
file we just uploaded. And we have the
attachments, which is the text file from
your Google Drive. So that's our first use case. As you can see, from
the user standpoint, we only provide one simple
English sentence to set up the meeting, and then everything
is done automatically for us. So that's our first use case. Now, let's move on
to the second use case, which is to draft an
email for us automatically. And the email body will be
based on the Gemini Pro Vision's analysis of our charts. So to demonstrate that, we
have created a spreadsheet. So this is a spreadsheet
that I created. It's about college expenses. I basically made up
of these numbers. So these numbers are definitely
just for demonstration purposes. They are not real numbers. But for our demo purposes,
I think it's sufficient. So in this case,
let's say I'm doing some kind of a data analysis. And then I create a chart
based on these numbers. And we're going to send
the image of this chart to the Gemini Pro Vision
API and ask the model to do analysis for us and
then Compose an email based on that analysis. So go ahead and save a
copy of this spreadsheet into your Google Drive. Once you do that, make sure to
remove "copy of" in the file name, and then go
ahead and make a copy. You should have a spreadsheet
in your Google Drive. I already have a copy there,
so I won't be doing that. And so that's our
demo spreadsheet with the chart embedded. And now, we're coming back to
implement the three steps that I mentioned at the beginning. So first, we need to add
the dispatching logic again because this is a
separate, new tool. So go ahead and copy
this else-if block, and come back to your
main.gs file in the editor, and then paste that
in your if-else block. So in this case, if the
function calling response asks us to call the
draft email tool, we'll go ahead and call that
function using the spreadsheet name and also a
recipient because we are drafting an email. And so that's our
step number one. And then second, we're
going to actually implement the functionality of this tool. In this case, copy the code
block in step number four. And now you can change
over to utils file, and scroll down to the
bottom of this file, and then paste that in. So now, let's look at what
this function is doing. So basically, we are using a
prompt that asks the Gemini Pro Vision model to compose
an email body for someone based on the model's analysis
for this particular chart embedded inside our spreadsheet. And then we added
some additional prompts to help the model output
some valuable information. And then again, we're going to
use Google Drive app to find the spreadsheet
file, and we're going to identify the chart embedded
in that spreadsheet file. And now, we have the chart, but
the chart is actually an object. It's not an image. So we have to save the file into
your Google Drive as a PNG file. Later on, we will use
this PNG file in a minute. So now, we are finally ready
to call the Gemini Pro Vision API using this prompt
we defined up here. And then we send the
chart in to the model. So we're not sending
any of the numbers or texts inside that
spreadsheet, just a pure image to the model. And once we get back the
email body, the drafted email body composed by the
model, then we're going to use the Gmail app
to create the email draft. And in this case, we're
adding the recipients. And also, of course, I'm using
a demo email provider domain. And there's an email title. This one is hardcoded here, but
you can usually change that. There's an email body from
the Gemini API response. And lastly, we're
attaching the chart image. Remember, we just
saved the chart image in our Google Drive right here. And we are attaching that file
from Google Drive to your email. And then that's our
step number two, which is implement the
tool functionality. Now, we are doing
the final step, which is to declare the tool
when we invoke function calling. So go ahead and copy the
code in step number five. And then scroll up to the top
of your utils.gs file and paste that file in right after the
comment, "add your tools here." So I probably missed something
right there, but let me-- yes, you just need this code,
not the entire code block. OK, so there's no
more error anymore. So let's look at this
part of the code. It basically declares a new
tool called a draft email. And this function will take in
two parameters, expression name and also the email recipient. And so that's our
step number three. And with that, we are
ready to do a test. If you scroll up to
step number three, we need to change the mock user
query before we actually run it. So go ahead and copy out
the previous user query and paste in this one. So this one just asks our system
to draft an email for Mary with insights from the
chart in that spreadsheet. And it's pretty
straightforward English. And now, we're ready to run it. Go ahead and click Run. Again, it will ask
me to authorize this, and then we'll allow
access to my Gmail account. Then it will kick
off running, and it takes a few seconds to finish. And let's give it a second. And this time-- remember, this
time, we are sending an image, so it's usually bigger
than your text prompt. So now, it looks like-- looking at the log,
it seems we already have the system successfully
created a email draft for us. Now, if you open up
your Gmail account and go to your Drafts
folder and click Refresh, then we have a nicely
written email all written by the Gemini API. We didn't type anything there. You can see the email
address, the title of the email, the recipient. And this whole email
body is written by Gmail. And all you need to do
is to review the copy and then make some
minor changes. The image, the chart image, is
also automatically attached. And it saves a lot
of time for you. So that's the second integration
we want to demonstrate to you. And there's a
third one, which is to use the Gemini API
to brainstorm on a topic and then come up with
interesting bullet points and talking points for you. And at the end, you will
have a basic spreadsheet. So this is a quick GIF
demo of the spreadsheet that it would create for you. But in the interest of
time, since this workshop is only 45 minutes,
we are not going to go through this part of
the integration for you. But please feel free to check
it out after you get home. It's actually easier
than the two integrations we have just built. Now, in this workshop,
we have demonstrated a couple of
integrations for you, but there are, of course, a
lot of ideas and possibilities you can try. And for example, you can
easily build a chat bot for Google Chat. Google Chat is a messaging
app, if you haven't used it. Because large language models
are great at being a chat bot, so you can easily do that. Another thing you
can try is you can try some advanced
techniques, like RAG-- Retrieval-Augmented
Generation-- with your files in Google Drive or
even Google Keep. And the reason this
technique is needed is when you have a
lot of stuff, you just can't pack everything
into a single prompt. Even with the announcement of
our two-million token context window model,
sometimes, still you want to just send the most
relevant context to the model when you work with
the Gemini model. So that's when you want to
use something called a RAG. But we're not going to
do a deep dive on RAG. There are dedicated
session at I/O this year, so you can
check them out and learn about the techniques
behind that. Another thing you can try is
to use the multi-turn function calling feature. So the integrations we have
demoed only uses a single turn. We asked the model to return a
response for function calling, and then that's it. We're not using function
calling anymore. But in reality, you can do
this in a multi-turn fashion. We're not going through
the details of this, but here's a link. You can read all about
it in our documentation. And then lastly,
in this workshop, we have only
demonstrated integrating with Google Workspace. But basically, there's
really nothing special about Google Workspace. You can pretty much
build integrations against any public or even
private products or services, and that just opens the
doors to so many things. And hopefully, with
our little workshop today, you can start
thinking about what you can build with the Gemini API. So finally, we have all made it. And just to quickly
recap, we walked through how to use
the Gemini API, how to leverage the
multi-modality feature and function calling
feature, and we showed how to build a
couple of integrations against Google Workspace. And I want to thank you
very much for attending this workshop. And hopefully, you all
find this workshop useful. And we can't wait to see what
you build in the real world. And thanks again. And enjoy the rest
of I/O. Thank you. [MUSIC PLAYING]