[MUSIC PLAYING] SACHIN KOTWANI: Hi, everyone. So as I was getting
ready to travel to come here, my daughter
asked me, where are you going? And I said, I'm going to be
on stage at I/O. And she said, are you going to sing? So lucky for you, I'm
not going to sing. Sorry. Hi, everyone. I'm Sachin, a leader in our
central AI and ML team thinking about machine learning,
running on the edge. And that includes tooling,
runtimes, and infrastructure. But to be clear,
that's just my day job. At night, I'm your standard nerd
who likes to build mobile apps, backends for those
apps, scripts that run on my multiple Raspberry
Pis at homes, and more. It's a lot of fun. With that, I'm
extremely grateful and humbled to have
the opportunity to be in front of you today
and to talk about a space that I'm so passionate about. So something that's
really had me excited lately, I'm sure,
as most of you, is large models and the power
and creativity that they can unlock. I've used them to help
summarize and consume large amounts of content, help
with writing and structuring random ideas. It's been a lot of fun. But you know what can take these
experiences to the next level? Having this intelligence on the
compute device most available to you, which includes benefits
around latency, better privacy, scalability, and
offline availability. Now with that, you can build
something really incredible. And you wouldn't
be the only one. Developers have been using ML
on devices for some time now. Here are some numbers. There are over 100,000 apps
just on Android using our stack. Those are running on over
2.7 billion devices, which are generating over a
trillion daily interpreter invocations per day. So why use ML directly
on edge devices? Well, you're here to listen
to a talk on this topic, so hopefully you're already
convinced and excited about the benefits and
prospects of this technology. But just to level
set, let's talk in more detail about why it edge
AI is useful in the first place. Customers often need apps
where low latency is critical. For example, when every
frame in a video stream needs to be processed, or where
offline connectivity would be great to have,
giving them capability to work even while on
a plane, for example. Running ML on the edge also has
additional benefits for privacy. Since the data doesn't
leave the device. And as a developer,
you reduce or eliminate the need to deal with serverside
maintenance, capacity, constraints, or cost
for ML inference. Now, ML models are known for
being large computationally, and data intensive, which means
that traditionally, you needed powerful servers to run them. However, two trends have made
it possible to run an increasing number of ML models
on edge devices. The first one is
advancements in ML research, including new model
architectures and techniques such as distillation
and quantization, which have made models more
efficient and smaller in size. The second one is more powerful
devices with GPUs and dedicated MPUs for ML processing. One example is our third
generation Google Tensor G3 chip on the Pixel
8 and Pixel 8 Pro, which is powerful enough to run
Gemini Nano fully on the device. This is an extremely
powerful combination that positions edge AI to put
even more power into your hands as developers in
the next few years. Let's walk through
some examples of what this looks like in real life. The Google Photos app uses
MediaPipe and TensorFlow Lite for the popular photo
unblur feature, which adds sharpness and brightness
to picture imperfect moments with just a tap. The entire feature
runs on device, taking advantage of GPU and TPU
accelerators where available. YouTube Shorts offers an
array of effects for creators, making it easier to create fun
and engaging content, again, all running on device. At Google, we are committed to
our mission of widening access to edge AI. And that is why you
have at your disposal a wide range of
tools and solutions that take you all the
way from high level APIs to pipelines, model
inference, and hooks into the different
hardware processors available on a
device, enabling you to run these models as fast
and efficiently as possible. If you're looking for ready
to use self-serve APIs, we offer a range of
high level solutions across various domains. Many of them include the
ability to customize, evaluate and deploy models in pipelines
based on your own data, all in just a few lines of code. Or if you'd like to
bring your own ML models, you'll find we provide
powerful and highly performant runtime and
pipelining solutions, like TensorFlow
Lite and MediaPipe. What you're looking
at here on the left is what the flow of
running a single model looks like, including ML and
non-ML pre and post-processing, and running an inference. And then on the right is
a more complex pipeline orchestrating various
models, and in both cases, giving you the advantage
of hardware acceleration with maximum control. And now, we are bringing
all of these technologies under a single umbrella called
Google AI Edge, where you'll find all the runtimes, tools,
trainings, samples, support, and documentation all in one
place to make your life easier. This is now the
primary destination where you will
find a set of tools that will enable you to bring on
device AI to your applications across mobile, web,
and embedded devices. But of course, in
this new destination, you'll also find all
the new APIs and tools that we are launching today. And I know you can't
wait to hear about those. All right? So let's jump right in. We are announcing some new
tools that you can go and try today that are focused on
framework optionality and GenAI. First, earlier this
year, we announced the experimental launch
of the LLM inference API with support for a set of
the most popular models, including Gemma,
to run on the edge. The API includes support
for Android, iOS, and web. And this API enables
you as a developer to run an LLM
fully on the device and also customize
and experiment with different models. Beyond these models
that are already available for you to use, which
are the most popular ones, we also want to enable you
to bring your own GenAI model architectures if you're
inclined to do so. So today, we are excited to give
you a sneak preview of our AI Edge Torch generative API, which
will allow you to do just that. Beyond just GenAI,
Google AI Edge builds on our work
with TensorFlow Lite to have the fastest
on-device runtime. And now we know
that ML innovation comes from a variety of sources
and diversity of frameworks. And we at Google want to help
advance innovation in edge AI, regardless of where
it comes from. You can already take
advantage of support for various model formats. So whether you're using
TensorFlow, Keras, or Jax, you get the best
performance possible. And today, we are
expanding that list by adding beta
support for PyTorch with full support
coming later this year. And all of this is on the same
infrastructure and runtime that you already know. In the next section, Cormac will
tell you more about both this. And we'll also show you how
a few of our early access partners, including
Shopify, Niantic, and Adobe, are using this functionality. And while you're
converting those models, you may need some help
debugging and visualizing them. So Aaron will later tell
you about Model Explorer, a powerful graph visualization
tool that helps you understand, debug, and optimize your ML
models for edge deployment. There are many teams inside of
Google using this tool already. And today, we are making it
available for all of you to use. Our goal with all
of these launches, ranging from easy-to-use APIs
to tooling and infrastructure for more advanced use cases is
to help support open innovation and enable you to build amazing
experiences for your users. So let's talk a little
more about GenAI. Over the past year or
so, you have no doubt seen a lot of excitement in the
community around this space, both on the consumer
side and, of course, on the developer side. Our vision for the
future is one where every app you interact
with has a fully fluid UI controlled by natural language
and completely personalized to you. That'd be pretty cool? Yeah? Who wants that? All right, louder. All right, all right. OK, well, so we're
not quite there yet, but we are getting closer. And I have something I'd
like to show you so you can see where we are headed. So what I'm going to show you is
a quick demo of Gemma 2B running fully on the browser
in a Chrome extension. But there are two other
interesting things happening here. First, we are going to use
Retrieval Augmented Generation, or RAG, to access
information that's outside the model's knowledge,
particularly because this is a smaller model, just 2B. And then we're going to feed
that information into the model. So it can help with our
request and answer questions about that content. And second, we are
going to use function calling to have the LLM call
other APIs on our behalf. All right, so imagine
I'm having friends over for brunch tomorrow. And I find this great pancake
recipe in an online recipe book. I need to know which
ingredients to buy. And I also need a reminder
to go to the grocery store to buy them. So how can a Chrome
extension running an LLM help me with that? Let's take a look. What we want to do first is
ask the about the ingredients in the pancake recipe only. After that, we are asking
it to create a calendar entry with our shopping list. What's magical about
this is that we are not telling it which
API calls to make or which arguments to pass in. Instead, we fine tune
Gemma to understand how to use these APIs. So we can interact with it
only using natural language. Pretty cool? All right, so now, this
is just a simple example. But it shows the way that
on-device language models, retrieval augmented generation,
and function calling can, together, make for
incredibly powerful interactions for end users. Next, I'm excited to hand it off
to Cormac, who will dive deeper into some of our new
products and APIs that made this kind
of demo possible. Cormac, please come on stage. [APPLAUSE] CORMAC BRICK: OK,
thanks, Sachin. That was great. I'm Cormac Brick. I'm an engineer working
on core machine learning. And this is where we get
to roll up our sleeves and look at some
code and some demos. And first up, we're going
to have a look at running LLMs on edge devices. OK, so now from Sachin,
you've gotten excited about the future
of on-device LLMs. Let's look a bit deeper at
the different ways you can access powerful LLMs on device. So first up, we
have Gemini Nano. And this is the built-in
GenAI on Android and Chrome. On Android, you may
have heard yesterday that this is already available
on the most capable Android phones. And this is amazing
because Gemini is already loaded on your phone
and optimized to run on hardware acceleration. And the Android team has a
talk about this called Android On-Device AI Under The Hood,
which you should go check out for more detail. And you'll hear-- there, you'll
hear more about availability and see some great
examples about how this is running in
production today for a number of Google features. Gemini Nano is also
coming soon to Chrome, as you might have heard
from Jason in our last talk. And this is starting on desktop. And developers soon will be
able to use powerful APIs to do things like summarization,
translation, and much more. And they also have a talk
called Practical On-Device AI that you should
check out and look at to learn a lot more
about Gemini on Chrome. Now, for devices and platforms
that don't have Gemini built in, that's what I'm going to
cover today in this session. And we're going to look
at two different ways where you can bring
your own model and run it on device
with Google AI Edge. First up, we're going to look at
the MediaPipe LLM Inference API. And these are pre-optimized
LLMs that work really well on multiple platforms. Then we're going to
cover our generative API. This enables you to
build your own generative models that use on-device
compute with great performance. OK, so in March, we released
the MediaPipe LLM Inference API. And this runs language
models entirely on the device with all of the scale, privacy,
and reliability advantages that Sachin covered earlier. And this is a really
easy to use API that covers web, iOS, and Android. And it's fast. You're going to be able
to see this for yourselves in a little bit in
some demo examples that are all real-time recordings. And the way this works is we
provide highly optimized models and easy access
to public weights. And you can also bring
your own weights, maybe a ready made variant from
the Hugging Face ecosystem, or perhaps something that
you fine tuned yourself. The choice is yours. And it's still a little early. So we're calling this
an experimental release. But let's have a look at some
code and then some demos. OK, so first up, here's
a few lines of code that you can integrate with your
application to get MediaPipe LLM Inference API working locally. This is showing an
Android example. And we also have
similar APIs to get you started on iOS
and also on web, which looks a bit like this. And here, we're showing Gemma
2B running entirely locally in the browser. Now, you might be
noticing all of our demos seem to be food related. I'm not sure what that's
about, but there's definitely a bit of a theme there. Now, this is all running
fully locally in the browser. And it's fast. And that's because it's
accelerated on the computer's GPU through WebGPU. And that makes it fast enough
to build pretty compelling fully local web applications. And here it is running
Gemma 2B on Android. And again, this is all real
time, running on real devices. And you can get the source
code for this demo app on our AI Edge docs page,
which we'll link at the end, and try this out
for yourself today. We also have it outside
in our demo booth. And this also runs on iOS. And this is the power of
the MediaPipe LLM libraries. It's the same model, same
weights, pretty similar APIs, and multiple platforms. Now, all of this
has been available for the last couple of months. And since then, the
team has been busy. And we've got some new features
we're happy to share today. So first up, we're really happy
to announce larger model support on the web. Our latest release enables
larger models like Gemma 7B, which helps you prototype even
more powerful web applications. [APPLAUSE] And today, we're also excited
to announce LoRA support. So what's LoRA? You might have heard
earlier in Gus's Gemma talk about how to fine tune
Gemma in Keras with LoRA. Or maybe you've already heard
that fine tuning a model is a great way to improve
quality in a way that's specific for your application. So LoRA is a fine tuning method
that's really easy to use, because firstly, compared
to full fine tuning, it uses way less compute,
which makes it more affordable. And you can also get
great results fine tuning with a relatively
small data set. And the resulting
LoRA fine tuned files are also pretty small. They're only a few megabytes
compared to the base models that are often several gigabytes. So now today, LLM Inference
API now supports LoRA. So this means you can
use several small LoRAs on top of the same
larger base model which can make it easy to ship
multiple compelling features in a single application, all
sharing the same base model. Now, the LLM inference
API is great. That's a great place to start. And you're probably wondering
what models it works with and how you get those models. So we support lots of
the popular open models that you can see here today. And you're also going to
see this list continue to expand over time. Then you bring weights
compatible with any of these architectures. And again, you can
use your own weights, find something on Hugging
Face, do your own fine tune. Then you use our converter. And you have a model that's
ready to run on device. Now, we expect most of you
to use pre-optimized models like these today. However, some of
you may also want to bring other architectures. For example, maybe you want
to use a smaller architecture that's not listed here. Or what if you
have a model that's proprietary to your company? Well, that's why I'm excited to
announce the Torch generative API that we're
introducing today. This helps you bring your
own LLM architectures to devices in a way
that's both easy and fast. So right from PyTorch,
you can realter your LLM using our optimized components. And we found it's really
helpful to stay in the Torch environment, as it's easy
to run evals there and work with other models. And then you use the Google
AI Edge Torch Converter to bring it to TFLite. This gets you a model you can
run with the existing TFLite runtime. And the resulting
model, it's pretty fast. We found it gets within 10% of a
handwritten C++ implementation. And we're also going to cover
the AI Edge Torch in more detail a little later. So today, we're releasing an
initial alpha version of this. This is an early
release, but we're really excited about this direction. And we'd love to
hear your feedback. So now, let's jump in and
have a look at some code. So here's what this looks
like to a developer. And first off, I'm
going to say, you don't need to read all
of this code immediately. I just wanted to give
you a feel for what a typical implementation
looks like. And the way this works
is you use a combination of some native PyTorch layers
that you can see up top there in the first couple of sections. And then you also can use
some of our optimized layers that come in the Torch
Generative AI library, like that you see at the bottom. Then you run all of this through
the AI Edge Torch Converter. And you get a file that you
can take and run on the device, which you can then use
to build some apps. And here's an example
using TinyLlama, where we ask for paraphrasing
options for a short message. And as you can see, you can-- GenAI has lots of potential
use cases on device. And it turns out
you can even use it for features that
have nothing to do with food, which came as a
surprise to some of our team. And here it is where helping us
rephrase a message from a user having fun at Google I/O. Now, models created using
the Torch Generative API are compatible with
both our high level MediaPipe LLM
Inference API, which is easy to use in
Android applications, like the one you just saw. And you can also use it with
our lower level TFLite runtime for greater control, which is
the same runtime you may already be using for other models. So today with our
initial launch, we support CPU and PyTorch. And you can expect GPU
quantization, MPU support coming along later this year, as well
as an equivalent API for Jax. Now, let's take a minute
to summarize what we've covered for on-device LLMs. So first up, built-in
Gemini is awesome. And it's available with
limited access on Android. And it's coming soon to Chrome. Secondly, we have
MediaPipe LLM Tasks. And these are highly optimized
versions of popular open models that work cross-platform. And finally, you
can go fully custom with the Torch Generative API,
which is fast and works also with the TFLite runtime. Now, I'm sure many of you
would like to start digging into the code and maybe trying
some of this out for yourselves. So you can. Just follow the link on screen. And there you'll also find
Colabs and end-to-end examples to help you get started. OK, so that's it for LLMs. And next up, I'd
like to cover how are we going to help you build
great AI-powered apps, no matter which framework
you're starting from. So we launched
TensorFlow Lite in 2017 with a mission to make it easy
to innovate and bring machine learning to mobile
and edge platforms. And that's been really
successful with hundreds of thousands of apps using
TFLite across Android, IoT, and iOS. And our mission hasn't changed. So to stay true to that
mission of enabling innovation, we're excited today to announce
official support for PyTorch and Jax in addition
to TensorFlow. And this allows you
to bring the best ideas from any of these
frameworks and run on the same-- with the same great-- on-device with the same
great TFLite runtime. [APPLAUSE] And because the TFLite powers ML
inference for the entire Google Edge AI stack, that means
you get framework optionality throughout that entire stack
that Sachin showed us earlier. So you can find
off-the-shelf models or trained models in the
framework of your choice, convert your models to
TFLite in a single step. And then you can run them all
on a single runtime bundled with your app across
Android, web, and iOS. And for anyone here already
using TFLite or MediaPipe today, you can use new models
from PyTorch or Jax with your existing packages. Just update to the
latest version. No need to change any
of your existing models, or your build dependencies,
or anything like that. So let's start by spending
a minute talking about Jax. Jax is a framework we use
extensively internally. All of the generative
models that you've heard about today and yesterday,
like Gemini and Gemma, are trained in Jax. And we also use Jax for lots
and lots of on-device use cases internally. And recently, we've seen
that lots of users and top AI companies also use to use Jax
for flexibility and efficiency. So we've updated support
and documentation to make this path easier
for the wider community to bring Jax models on device. And this is what it
looks like in code. With just a few lines of code,
you can take your Jax model, add a Jax module, and then
export it to a TFLite file with just a few lines. And now for PyTorch. We're particularly
excited about this one because PyTorch support is by
far and away the most frequent feature request we get from both
enterprise teams and community developers. So we're here to say,
we have heard you. We know that many
of you love PyTorch. And many of you love TFLite. But the path between
them hasn't always been easy or well supported. We've seen a few
community projects that have kind of filled that gap. Maybe going from
PyTorch to Onyx and Onyx to TensorFlow and
TensorFlow to TFLite. And that was too many steps,
each one brittle, and a place where things could go wrong. So we knew that converting
PyTorch to TFLite could be much easier. Well, we're happy to say
with our new Python package, it now is. So directly from your
PyTorch environment, you import AI Edge Torch,
initialize your model, call our converter. You can test the output
right there in Python. And then export to a TFLite file
that's ready to use on device. It really is that easy. And PyTorch support
for TFLite is publicly available today in beta. And it's also on GitHub
at AI Edge Torch. Or you can follow
the line on screen to check it out for yourselves. And we've tested AI Edge Torch
with over 70 popular PyTorch models. And we've been blown away
by the ease of conversion and by the performance. And we've built
our PyTorch support using many PyTorch native
features, including things like Torch Export as a
consistent way to export models, PT to E for quantization,
and Core A10 for operator expression in PyTorch. And you can read our blog
post that came out today for more details on
performance and our underlying implementation. Now, if you're running PyTorch
models on Android already, either via a community
provided conversion to TFLite or by
another ML framework, we strongly recommend
you come try this out. We can confidently
say from our testing, if your models are
supported with our beta, you'll likely see a significant
performance improvements. Now, we've also worked with
lots of partners who've given us invaluable
feedback while developing multi-framework
support in TFLite, including those listed here. And to everyone who's helped
us with testing, feedback, community advice, including
the companies listed here, thank you so much. We really appreciate it. And a great example
of this recently has been our work with Shopify. We're confident you'll find
the new PyTorch support useful because we've tested
it in production apps over the last few months
with partners like Mustafa and the team at Shopify. And they found it
great, as you can see, for creating mobile
ready PyTorch models. And as you can see
on the right, it's already being used by his team
to perform on-device background removal for product images. And this will be
available in upcoming release of the Shopify app. And we're really passionate
about helping developers build great applications that
use on-device AI like this one. So we're really happy to see
new features like this go out. So in the last few years, we've
also seen great innovation in the Android
hardware ecosystem, with AI improvements
in CPUs and GPUs and even new specialized
hardware accelerators, sometimes called MPUs that
offer exciting potential for even faster AI. And that's why we're really
excited to be bringing a world where Jax, PyTorch,
TensorFlow models get to take advantage of all
of the specialized acceleration you can find in
Android devices today. And we're working with
leading technology partners like those shown
here to help make that happen. And coverage of
neural accelerators will expand over this year, some
of which I can talk about today, and others we'll talk
more about in future. But today, we're
thrilled to co-announce Qualcomm's new TFLite delegate. A delegate is an
add-on to TFLite that enables
accelerated compute. And the Qualcomm
delegate supports most models in our
PyTorch test set and most TensorFlow
and Jax models as well. And it's compatible also
with leading Qualcomm silicon products that have been
released in the last five years. And as you can see, it gives
really great performance on a wide set of models. So the QNN delegate, it's
openly available today. You can come check
out our blog post about AI Edge Torch for more
details and availability. Additionally, Qualcomm recently
announced the Qualcomm AI Hub. And this is a cloud service
that lets you upload a model and then test it against a
wide pool of Android devices. And this gives you
the chance to see the performance of your model
using accelerated compute on different Qualcomm
enabled hardware without needing to set
up a complex device lab. This is great, as you can
explore how to accelerate your AI in your own app. So to try all of this
out for yourself, go check out the link on screen. And we have lots of great
code samples, documentation and Colabs available for
each of these frameworks. You'll also find source code
for Android, iOS, and web apps. So you can see everything
running end to end. Now, that's a wrap for
this section and for me. And next up, we
have Aaron, who's going to share more details
about an exciting new tool for working with large models. Thanks. [APPLAUSE] AARON KARP: Hello. Thank you so much, Cormac. I'm Aaron. And like Sachin and Cormac,
I work on Google's AI Edge platform. Now, like many of you who work
with machine learning every day, Google researchers
and engineers need the best possible
understanding of what's happening inside the models that
they're developing and deploying to production. Now, for the reasons that Sachin
discussed earlier, most of us haven't been working with
large models for all that long, especially on device. But suddenly,
that's all changing. And our tools need to keep up. That's why we're excited to
announce a new tool under our AI Edge umbrella called
Model Explorer. And we built it
from the ground up to solve some of the
most common problems that we all encounter
when working with large models
running on device. Model Explorer gives you better
visibility into model behavior. And better visibility
means you can work faster, more accurately,
and more collaboratively. Let's take a look at
three common use cases when working with edge devices. First, conversion. Often, after you convert a model
from one format to another, say PyTorch to TensorFlow
Lite, it's really useful to validate that the
architecture looks the way you expect and data is
flowing between nodes correctly. Or maybe you're looking for
ways to quantize a model. Quantization is a
process that makes models smaller and often faster. And a first step can be to look
for computationally expensive nodes. Or maybe you're optimizing
the performance and accuracy of your model and you
want to better understand the output from your
benchmarking and quality debugging tools. Model Explorer is an ideal
tool for these situations. Google engineers
across the company use Model Explorer every day. And this week, we're
so excited that we're making it publicly available. Because the ML universe
is expanding and evolving so rapidly, we built
it from the ground up to work with pretty much
any type of neural network, ranging from small segmentation
models to large, complex LLMs. For example, right now
you're looking at Gemma 2B, which is the model with
almost 2,000 nodes. And we've tested Model Explorer
with graphs containing up to 50,000 nodes. And it still runs
buttery smooth. Here's how Google teams
described their experience. The Waymo team says, "Model
Explorer is a daily essential for Waymo's ML infra team
and model building teams." And the Google
Silicon team said, "Model Explorer's accelerated
workflow allows us to swiftly address bottlenecks, leading
to the successful launch of multiple image speech
and LLM use cases, especially Gemini Nano
on Pixel devices." OK, now for the fun part. You guys want to
see a live demo? [CHEER] OK, so let's look at
a scenario you might encounter in the real world. I'm going to show
you how you can bring your own node-by-node data, and
then overlay it on the model graph for easy visualization. Let's say I'm the developer of
an app for bird lovers called It's your bird day. And I want to add a new feature
that classifies bird photos. By the way, Gemini didn't
just do that illustration, it also came up
with the app name. So if you ask me, this
whole GenAI thing, it's worth it, even
if just for the puns. Anyway, it's critical to me
that my app works well offline because my users
might be out in nature without a mobile
data connection. So naturally, on-device
AI is the way to go. To do the
classification, I'm going to use a popular
lightweight computer vision model called mobilenet V3. But I'm worried
about performance. I've heard that XNNPACK, which
is a highly optimized library for CPU, might speed things up. So as my next step, I'm
going to benchmark my model both with and
without XNNPACK ops, and then visualize the
results in Model Explorer. So let's head over to Colab. OK, all right, so first,
we need to do some imports. Then we need to run our
benchmarking utility twice. But that's kind of
the boring part. So I've already taken
care of all that. Let's skip down
to Model Explorer. All right, we're going
to import Model Explorer. We're going to pass
in the model path. And we're going to pass in
our two benchmarking output. We click Visualize. And here we have it. So on the left, of course,
is the model graph. Now, it's collapsed
by default. That makes it really easy to get
your bearings when you initially load a model. On the right, we have the
output from our bench markers. Because I ran it twice,
I have two sets of data. Excuse me, and
right off the bat, we can see that without
XNNPACK on the left, it takes about 20 milliseconds
to run inference on this model. With XNNPACK, it takes
barely over 2 milliseconds. So clearly, huge benefits. But where Model
Explorer really shines is letting you dig in
per node and seeing where those nodes are in your graph. So let's go down to
the per node data here. I'm going to sort
by execution time. And I can see the most
problematic nodes right here. I have these 2D convolution
nodes right at the top that are showing up in red. If I click them, it zooms right
to that spot within the graph. And you might be saying,
Aaron, these only take 1 millisecond to run
or 2 milliseconds to run. But imagine you're running
inference over a stack of 1,000 photos that your user has
taken out in the wild, that really adds up. So the ability to optimize
this is really important. Model Explorer also comes with
many of the convenience features that you would expect. For example, I can bookmark this
view and return to it later. I can export this view as a PNG. I can flatten all of
the layers in the graph, which can be particularly useful
for sequential models like this. I can overlay all
sorts of node data, including tensor
shape on the edges. And I can also search
for specific nodes and see the data
about those notes. So that's just one high
level look at a specific way to use Model Explorer. But I'm sure you can
already think of many more. So let's leave
Colab now and talk about how to get it
running for your models. So Model Explorer is
available as a Python package. And it's designed to be
used in either of two ways. One option, as you just
saw, is inside Colab. We know that so many
of you love using Colab as your starting point
when working with ML. So we made it super
simple to plug Model Explorer into your workflow. Another option is
to simply run it as a standalone tool
on your local machine. Just install it from pip,
run the startup command, and Model Explorer
opens in your browser. This approach lets you work
quickly and easily with models on your local file system. And you heard from Cormac about
Google's growing commitment to a multi-framework ecosystem. So I'm pleased to say that
Model Explorer supports a range of model formats
that are generated by a variety of frameworks. Whether you start with
TensorFlow, PyTorch, Jax, or any other
framework, as long as you can export
your model into one of the formats up on
here on the screen, you can view it
in Model Explorer. We see this is
just the beginning. And we're excited to add more
comprehensive benchmarking and debugging features
in the future. So we are so grateful
that you've chosen to spend this time with us. To recap, Sachin kicked
us off by talking about the power of on-device ML
to unlock low latency, highly private, offline use cases
without server costs. Cormac then discussed how Google
is bringing the power of LLMs on device to Android,
Chrome, and iOS via simple, yet powerful APIs. Then he talked about
how Google is continuing its investment in
TensorFlow and Jax while embracing PyTorch as well. Finally, I showed you
how Model Explorer makes converting, quantizing,
and optimizing on-device models easier and more enjoyable. All of this is now available
as part of Google AI Edge. Be sure to visit the
link on the screen, where you can find more information
about all of these launches. And finally, let
me close by saying, we are thrilled that you're
on this journey with us. And we're excited to build
this future with you together. Thank you. [MUSIC PLAYING]