Generative AI on mobile and web with Google AI Edge

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] SACHIN KOTWANI: Hi, everyone. So as I was getting ready to travel to come here, my daughter asked me, where are you going? And I said, I'm going to be on stage at I/O. And she said, are you going to sing? So lucky for you, I'm not going to sing. Sorry. Hi, everyone. I'm Sachin, a leader in our central AI and ML team thinking about machine learning, running on the edge. And that includes tooling, runtimes, and infrastructure. But to be clear, that's just my day job. At night, I'm your standard nerd who likes to build mobile apps, backends for those apps, scripts that run on my multiple Raspberry Pis at homes, and more. It's a lot of fun. With that, I'm extremely grateful and humbled to have the opportunity to be in front of you today and to talk about a space that I'm so passionate about. So something that's really had me excited lately, I'm sure, as most of you, is large models and the power and creativity that they can unlock. I've used them to help summarize and consume large amounts of content, help with writing and structuring random ideas. It's been a lot of fun. But you know what can take these experiences to the next level? Having this intelligence on the compute device most available to you, which includes benefits around latency, better privacy, scalability, and offline availability. Now with that, you can build something really incredible. And you wouldn't be the only one. Developers have been using ML on devices for some time now. Here are some numbers. There are over 100,000 apps just on Android using our stack. Those are running on over 2.7 billion devices, which are generating over a trillion daily interpreter invocations per day. So why use ML directly on edge devices? Well, you're here to listen to a talk on this topic, so hopefully you're already convinced and excited about the benefits and prospects of this technology. But just to level set, let's talk in more detail about why it edge AI is useful in the first place. Customers often need apps where low latency is critical. For example, when every frame in a video stream needs to be processed, or where offline connectivity would be great to have, giving them capability to work even while on a plane, for example. Running ML on the edge also has additional benefits for privacy. Since the data doesn't leave the device. And as a developer, you reduce or eliminate the need to deal with serverside maintenance, capacity, constraints, or cost for ML inference. Now, ML models are known for being large computationally, and data intensive, which means that traditionally, you needed powerful servers to run them. However, two trends have made it possible to run an increasing number of ML models on edge devices. The first one is advancements in ML research, including new model architectures and techniques such as distillation and quantization, which have made models more efficient and smaller in size. The second one is more powerful devices with GPUs and dedicated MPUs for ML processing. One example is our third generation Google Tensor G3 chip on the Pixel 8 and Pixel 8 Pro, which is powerful enough to run Gemini Nano fully on the device. This is an extremely powerful combination that positions edge AI to put even more power into your hands as developers in the next few years. Let's walk through some examples of what this looks like in real life. The Google Photos app uses MediaPipe and TensorFlow Lite for the popular photo unblur feature, which adds sharpness and brightness to picture imperfect moments with just a tap. The entire feature runs on device, taking advantage of GPU and TPU accelerators where available. YouTube Shorts offers an array of effects for creators, making it easier to create fun and engaging content, again, all running on device. At Google, we are committed to our mission of widening access to edge AI. And that is why you have at your disposal a wide range of tools and solutions that take you all the way from high level APIs to pipelines, model inference, and hooks into the different hardware processors available on a device, enabling you to run these models as fast and efficiently as possible. If you're looking for ready to use self-serve APIs, we offer a range of high level solutions across various domains. Many of them include the ability to customize, evaluate and deploy models in pipelines based on your own data, all in just a few lines of code. Or if you'd like to bring your own ML models, you'll find we provide powerful and highly performant runtime and pipelining solutions, like TensorFlow Lite and MediaPipe. What you're looking at here on the left is what the flow of running a single model looks like, including ML and non-ML pre and post-processing, and running an inference. And then on the right is a more complex pipeline orchestrating various models, and in both cases, giving you the advantage of hardware acceleration with maximum control. And now, we are bringing all of these technologies under a single umbrella called Google AI Edge, where you'll find all the runtimes, tools, trainings, samples, support, and documentation all in one place to make your life easier. This is now the primary destination where you will find a set of tools that will enable you to bring on device AI to your applications across mobile, web, and embedded devices. But of course, in this new destination, you'll also find all the new APIs and tools that we are launching today. And I know you can't wait to hear about those. All right? So let's jump right in. We are announcing some new tools that you can go and try today that are focused on framework optionality and GenAI. First, earlier this year, we announced the experimental launch of the LLM inference API with support for a set of the most popular models, including Gemma, to run on the edge. The API includes support for Android, iOS, and web. And this API enables you as a developer to run an LLM fully on the device and also customize and experiment with different models. Beyond these models that are already available for you to use, which are the most popular ones, we also want to enable you to bring your own GenAI model architectures if you're inclined to do so. So today, we are excited to give you a sneak preview of our AI Edge Torch generative API, which will allow you to do just that. Beyond just GenAI, Google AI Edge builds on our work with TensorFlow Lite to have the fastest on-device runtime. And now we know that ML innovation comes from a variety of sources and diversity of frameworks. And we at Google want to help advance innovation in edge AI, regardless of where it comes from. You can already take advantage of support for various model formats. So whether you're using TensorFlow, Keras, or Jax, you get the best performance possible. And today, we are expanding that list by adding beta support for PyTorch with full support coming later this year. And all of this is on the same infrastructure and runtime that you already know. In the next section, Cormac will tell you more about both this. And we'll also show you how a few of our early access partners, including Shopify, Niantic, and Adobe, are using this functionality. And while you're converting those models, you may need some help debugging and visualizing them. So Aaron will later tell you about Model Explorer, a powerful graph visualization tool that helps you understand, debug, and optimize your ML models for edge deployment. There are many teams inside of Google using this tool already. And today, we are making it available for all of you to use. Our goal with all of these launches, ranging from easy-to-use APIs to tooling and infrastructure for more advanced use cases is to help support open innovation and enable you to build amazing experiences for your users. So let's talk a little more about GenAI. Over the past year or so, you have no doubt seen a lot of excitement in the community around this space, both on the consumer side and, of course, on the developer side. Our vision for the future is one where every app you interact with has a fully fluid UI controlled by natural language and completely personalized to you. That'd be pretty cool? Yeah? Who wants that? All right, louder. All right, all right. OK, well, so we're not quite there yet, but we are getting closer. And I have something I'd like to show you so you can see where we are headed. So what I'm going to show you is a quick demo of Gemma 2B running fully on the browser in a Chrome extension. But there are two other interesting things happening here. First, we are going to use Retrieval Augmented Generation, or RAG, to access information that's outside the model's knowledge, particularly because this is a smaller model, just 2B. And then we're going to feed that information into the model. So it can help with our request and answer questions about that content. And second, we are going to use function calling to have the LLM call other APIs on our behalf. All right, so imagine I'm having friends over for brunch tomorrow. And I find this great pancake recipe in an online recipe book. I need to know which ingredients to buy. And I also need a reminder to go to the grocery store to buy them. So how can a Chrome extension running an LLM help me with that? Let's take a look. What we want to do first is ask the about the ingredients in the pancake recipe only. After that, we are asking it to create a calendar entry with our shopping list. What's magical about this is that we are not telling it which API calls to make or which arguments to pass in. Instead, we fine tune Gemma to understand how to use these APIs. So we can interact with it only using natural language. Pretty cool? All right, so now, this is just a simple example. But it shows the way that on-device language models, retrieval augmented generation, and function calling can, together, make for incredibly powerful interactions for end users. Next, I'm excited to hand it off to Cormac, who will dive deeper into some of our new products and APIs that made this kind of demo possible. Cormac, please come on stage. [APPLAUSE] CORMAC BRICK: OK, thanks, Sachin. That was great. I'm Cormac Brick. I'm an engineer working on core machine learning. And this is where we get to roll up our sleeves and look at some code and some demos. And first up, we're going to have a look at running LLMs on edge devices. OK, so now from Sachin, you've gotten excited about the future of on-device LLMs. Let's look a bit deeper at the different ways you can access powerful LLMs on device. So first up, we have Gemini Nano. And this is the built-in GenAI on Android and Chrome. On Android, you may have heard yesterday that this is already available on the most capable Android phones. And this is amazing because Gemini is already loaded on your phone and optimized to run on hardware acceleration. And the Android team has a talk about this called Android On-Device AI Under The Hood, which you should go check out for more detail. And you'll hear-- there, you'll hear more about availability and see some great examples about how this is running in production today for a number of Google features. Gemini Nano is also coming soon to Chrome, as you might have heard from Jason in our last talk. And this is starting on desktop. And developers soon will be able to use powerful APIs to do things like summarization, translation, and much more. And they also have a talk called Practical On-Device AI that you should check out and look at to learn a lot more about Gemini on Chrome. Now, for devices and platforms that don't have Gemini built in, that's what I'm going to cover today in this session. And we're going to look at two different ways where you can bring your own model and run it on device with Google AI Edge. First up, we're going to look at the MediaPipe LLM Inference API. And these are pre-optimized LLMs that work really well on multiple platforms. Then we're going to cover our generative API. This enables you to build your own generative models that use on-device compute with great performance. OK, so in March, we released the MediaPipe LLM Inference API. And this runs language models entirely on the device with all of the scale, privacy, and reliability advantages that Sachin covered earlier. And this is a really easy to use API that covers web, iOS, and Android. And it's fast. You're going to be able to see this for yourselves in a little bit in some demo examples that are all real-time recordings. And the way this works is we provide highly optimized models and easy access to public weights. And you can also bring your own weights, maybe a ready made variant from the Hugging Face ecosystem, or perhaps something that you fine tuned yourself. The choice is yours. And it's still a little early. So we're calling this an experimental release. But let's have a look at some code and then some demos. OK, so first up, here's a few lines of code that you can integrate with your application to get MediaPipe LLM Inference API working locally. This is showing an Android example. And we also have similar APIs to get you started on iOS and also on web, which looks a bit like this. And here, we're showing Gemma 2B running entirely locally in the browser. Now, you might be noticing all of our demos seem to be food related. I'm not sure what that's about, but there's definitely a bit of a theme there. Now, this is all running fully locally in the browser. And it's fast. And that's because it's accelerated on the computer's GPU through WebGPU. And that makes it fast enough to build pretty compelling fully local web applications. And here it is running Gemma 2B on Android. And again, this is all real time, running on real devices. And you can get the source code for this demo app on our AI Edge docs page, which we'll link at the end, and try this out for yourself today. We also have it outside in our demo booth. And this also runs on iOS. And this is the power of the MediaPipe LLM libraries. It's the same model, same weights, pretty similar APIs, and multiple platforms. Now, all of this has been available for the last couple of months. And since then, the team has been busy. And we've got some new features we're happy to share today. So first up, we're really happy to announce larger model support on the web. Our latest release enables larger models like Gemma 7B, which helps you prototype even more powerful web applications. [APPLAUSE] And today, we're also excited to announce LoRA support. So what's LoRA? You might have heard earlier in Gus's Gemma talk about how to fine tune Gemma in Keras with LoRA. Or maybe you've already heard that fine tuning a model is a great way to improve quality in a way that's specific for your application. So LoRA is a fine tuning method that's really easy to use, because firstly, compared to full fine tuning, it uses way less compute, which makes it more affordable. And you can also get great results fine tuning with a relatively small data set. And the resulting LoRA fine tuned files are also pretty small. They're only a few megabytes compared to the base models that are often several gigabytes. So now today, LLM Inference API now supports LoRA. So this means you can use several small LoRAs on top of the same larger base model which can make it easy to ship multiple compelling features in a single application, all sharing the same base model. Now, the LLM inference API is great. That's a great place to start. And you're probably wondering what models it works with and how you get those models. So we support lots of the popular open models that you can see here today. And you're also going to see this list continue to expand over time. Then you bring weights compatible with any of these architectures. And again, you can use your own weights, find something on Hugging Face, do your own fine tune. Then you use our converter. And you have a model that's ready to run on device. Now, we expect most of you to use pre-optimized models like these today. However, some of you may also want to bring other architectures. For example, maybe you want to use a smaller architecture that's not listed here. Or what if you have a model that's proprietary to your company? Well, that's why I'm excited to announce the Torch generative API that we're introducing today. This helps you bring your own LLM architectures to devices in a way that's both easy and fast. So right from PyTorch, you can realter your LLM using our optimized components. And we found it's really helpful to stay in the Torch environment, as it's easy to run evals there and work with other models. And then you use the Google AI Edge Torch Converter to bring it to TFLite. This gets you a model you can run with the existing TFLite runtime. And the resulting model, it's pretty fast. We found it gets within 10% of a handwritten C++ implementation. And we're also going to cover the AI Edge Torch in more detail a little later. So today, we're releasing an initial alpha version of this. This is an early release, but we're really excited about this direction. And we'd love to hear your feedback. So now, let's jump in and have a look at some code. So here's what this looks like to a developer. And first off, I'm going to say, you don't need to read all of this code immediately. I just wanted to give you a feel for what a typical implementation looks like. And the way this works is you use a combination of some native PyTorch layers that you can see up top there in the first couple of sections. And then you also can use some of our optimized layers that come in the Torch Generative AI library, like that you see at the bottom. Then you run all of this through the AI Edge Torch Converter. And you get a file that you can take and run on the device, which you can then use to build some apps. And here's an example using TinyLlama, where we ask for paraphrasing options for a short message. And as you can see, you can-- GenAI has lots of potential use cases on device. And it turns out you can even use it for features that have nothing to do with food, which came as a surprise to some of our team. And here it is where helping us rephrase a message from a user having fun at Google I/O. Now, models created using the Torch Generative API are compatible with both our high level MediaPipe LLM Inference API, which is easy to use in Android applications, like the one you just saw. And you can also use it with our lower level TFLite runtime for greater control, which is the same runtime you may already be using for other models. So today with our initial launch, we support CPU and PyTorch. And you can expect GPU quantization, MPU support coming along later this year, as well as an equivalent API for Jax. Now, let's take a minute to summarize what we've covered for on-device LLMs. So first up, built-in Gemini is awesome. And it's available with limited access on Android. And it's coming soon to Chrome. Secondly, we have MediaPipe LLM Tasks. And these are highly optimized versions of popular open models that work cross-platform. And finally, you can go fully custom with the Torch Generative API, which is fast and works also with the TFLite runtime. Now, I'm sure many of you would like to start digging into the code and maybe trying some of this out for yourselves. So you can. Just follow the link on screen. And there you'll also find Colabs and end-to-end examples to help you get started. OK, so that's it for LLMs. And next up, I'd like to cover how are we going to help you build great AI-powered apps, no matter which framework you're starting from. So we launched TensorFlow Lite in 2017 with a mission to make it easy to innovate and bring machine learning to mobile and edge platforms. And that's been really successful with hundreds of thousands of apps using TFLite across Android, IoT, and iOS. And our mission hasn't changed. So to stay true to that mission of enabling innovation, we're excited today to announce official support for PyTorch and Jax in addition to TensorFlow. And this allows you to bring the best ideas from any of these frameworks and run on the same-- with the same great-- on-device with the same great TFLite runtime. [APPLAUSE] And because the TFLite powers ML inference for the entire Google Edge AI stack, that means you get framework optionality throughout that entire stack that Sachin showed us earlier. So you can find off-the-shelf models or trained models in the framework of your choice, convert your models to TFLite in a single step. And then you can run them all on a single runtime bundled with your app across Android, web, and iOS. And for anyone here already using TFLite or MediaPipe today, you can use new models from PyTorch or Jax with your existing packages. Just update to the latest version. No need to change any of your existing models, or your build dependencies, or anything like that. So let's start by spending a minute talking about Jax. Jax is a framework we use extensively internally. All of the generative models that you've heard about today and yesterday, like Gemini and Gemma, are trained in Jax. And we also use Jax for lots and lots of on-device use cases internally. And recently, we've seen that lots of users and top AI companies also use to use Jax for flexibility and efficiency. So we've updated support and documentation to make this path easier for the wider community to bring Jax models on device. And this is what it looks like in code. With just a few lines of code, you can take your Jax model, add a Jax module, and then export it to a TFLite file with just a few lines. And now for PyTorch. We're particularly excited about this one because PyTorch support is by far and away the most frequent feature request we get from both enterprise teams and community developers. So we're here to say, we have heard you. We know that many of you love PyTorch. And many of you love TFLite. But the path between them hasn't always been easy or well supported. We've seen a few community projects that have kind of filled that gap. Maybe going from PyTorch to Onyx and Onyx to TensorFlow and TensorFlow to TFLite. And that was too many steps, each one brittle, and a place where things could go wrong. So we knew that converting PyTorch to TFLite could be much easier. Well, we're happy to say with our new Python package, it now is. So directly from your PyTorch environment, you import AI Edge Torch, initialize your model, call our converter. You can test the output right there in Python. And then export to a TFLite file that's ready to use on device. It really is that easy. And PyTorch support for TFLite is publicly available today in beta. And it's also on GitHub at AI Edge Torch. Or you can follow the line on screen to check it out for yourselves. And we've tested AI Edge Torch with over 70 popular PyTorch models. And we've been blown away by the ease of conversion and by the performance. And we've built our PyTorch support using many PyTorch native features, including things like Torch Export as a consistent way to export models, PT to E for quantization, and Core A10 for operator expression in PyTorch. And you can read our blog post that came out today for more details on performance and our underlying implementation. Now, if you're running PyTorch models on Android already, either via a community provided conversion to TFLite or by another ML framework, we strongly recommend you come try this out. We can confidently say from our testing, if your models are supported with our beta, you'll likely see a significant performance improvements. Now, we've also worked with lots of partners who've given us invaluable feedback while developing multi-framework support in TFLite, including those listed here. And to everyone who's helped us with testing, feedback, community advice, including the companies listed here, thank you so much. We really appreciate it. And a great example of this recently has been our work with Shopify. We're confident you'll find the new PyTorch support useful because we've tested it in production apps over the last few months with partners like Mustafa and the team at Shopify. And they found it great, as you can see, for creating mobile ready PyTorch models. And as you can see on the right, it's already being used by his team to perform on-device background removal for product images. And this will be available in upcoming release of the Shopify app. And we're really passionate about helping developers build great applications that use on-device AI like this one. So we're really happy to see new features like this go out. So in the last few years, we've also seen great innovation in the Android hardware ecosystem, with AI improvements in CPUs and GPUs and even new specialized hardware accelerators, sometimes called MPUs that offer exciting potential for even faster AI. And that's why we're really excited to be bringing a world where Jax, PyTorch, TensorFlow models get to take advantage of all of the specialized acceleration you can find in Android devices today. And we're working with leading technology partners like those shown here to help make that happen. And coverage of neural accelerators will expand over this year, some of which I can talk about today, and others we'll talk more about in future. But today, we're thrilled to co-announce Qualcomm's new TFLite delegate. A delegate is an add-on to TFLite that enables accelerated compute. And the Qualcomm delegate supports most models in our PyTorch test set and most TensorFlow and Jax models as well. And it's compatible also with leading Qualcomm silicon products that have been released in the last five years. And as you can see, it gives really great performance on a wide set of models. So the QNN delegate, it's openly available today. You can come check out our blog post about AI Edge Torch for more details and availability. Additionally, Qualcomm recently announced the Qualcomm AI Hub. And this is a cloud service that lets you upload a model and then test it against a wide pool of Android devices. And this gives you the chance to see the performance of your model using accelerated compute on different Qualcomm enabled hardware without needing to set up a complex device lab. This is great, as you can explore how to accelerate your AI in your own app. So to try all of this out for yourself, go check out the link on screen. And we have lots of great code samples, documentation and Colabs available for each of these frameworks. You'll also find source code for Android, iOS, and web apps. So you can see everything running end to end. Now, that's a wrap for this section and for me. And next up, we have Aaron, who's going to share more details about an exciting new tool for working with large models. Thanks. [APPLAUSE] AARON KARP: Hello. Thank you so much, Cormac. I'm Aaron. And like Sachin and Cormac, I work on Google's AI Edge platform. Now, like many of you who work with machine learning every day, Google researchers and engineers need the best possible understanding of what's happening inside the models that they're developing and deploying to production. Now, for the reasons that Sachin discussed earlier, most of us haven't been working with large models for all that long, especially on device. But suddenly, that's all changing. And our tools need to keep up. That's why we're excited to announce a new tool under our AI Edge umbrella called Model Explorer. And we built it from the ground up to solve some of the most common problems that we all encounter when working with large models running on device. Model Explorer gives you better visibility into model behavior. And better visibility means you can work faster, more accurately, and more collaboratively. Let's take a look at three common use cases when working with edge devices. First, conversion. Often, after you convert a model from one format to another, say PyTorch to TensorFlow Lite, it's really useful to validate that the architecture looks the way you expect and data is flowing between nodes correctly. Or maybe you're looking for ways to quantize a model. Quantization is a process that makes models smaller and often faster. And a first step can be to look for computationally expensive nodes. Or maybe you're optimizing the performance and accuracy of your model and you want to better understand the output from your benchmarking and quality debugging tools. Model Explorer is an ideal tool for these situations. Google engineers across the company use Model Explorer every day. And this week, we're so excited that we're making it publicly available. Because the ML universe is expanding and evolving so rapidly, we built it from the ground up to work with pretty much any type of neural network, ranging from small segmentation models to large, complex LLMs. For example, right now you're looking at Gemma 2B, which is the model with almost 2,000 nodes. And we've tested Model Explorer with graphs containing up to 50,000 nodes. And it still runs buttery smooth. Here's how Google teams described their experience. The Waymo team says, "Model Explorer is a daily essential for Waymo's ML infra team and model building teams." And the Google Silicon team said, "Model Explorer's accelerated workflow allows us to swiftly address bottlenecks, leading to the successful launch of multiple image speech and LLM use cases, especially Gemini Nano on Pixel devices." OK, now for the fun part. You guys want to see a live demo? [CHEER] OK, so let's look at a scenario you might encounter in the real world. I'm going to show you how you can bring your own node-by-node data, and then overlay it on the model graph for easy visualization. Let's say I'm the developer of an app for bird lovers called It's your bird day. And I want to add a new feature that classifies bird photos. By the way, Gemini didn't just do that illustration, it also came up with the app name. So if you ask me, this whole GenAI thing, it's worth it, even if just for the puns. Anyway, it's critical to me that my app works well offline because my users might be out in nature without a mobile data connection. So naturally, on-device AI is the way to go. To do the classification, I'm going to use a popular lightweight computer vision model called mobilenet V3. But I'm worried about performance. I've heard that XNNPACK, which is a highly optimized library for CPU, might speed things up. So as my next step, I'm going to benchmark my model both with and without XNNPACK ops, and then visualize the results in Model Explorer. So let's head over to Colab. OK, all right, so first, we need to do some imports. Then we need to run our benchmarking utility twice. But that's kind of the boring part. So I've already taken care of all that. Let's skip down to Model Explorer. All right, we're going to import Model Explorer. We're going to pass in the model path. And we're going to pass in our two benchmarking output. We click Visualize. And here we have it. So on the left, of course, is the model graph. Now, it's collapsed by default. That makes it really easy to get your bearings when you initially load a model. On the right, we have the output from our bench markers. Because I ran it twice, I have two sets of data. Excuse me, and right off the bat, we can see that without XNNPACK on the left, it takes about 20 milliseconds to run inference on this model. With XNNPACK, it takes barely over 2 milliseconds. So clearly, huge benefits. But where Model Explorer really shines is letting you dig in per node and seeing where those nodes are in your graph. So let's go down to the per node data here. I'm going to sort by execution time. And I can see the most problematic nodes right here. I have these 2D convolution nodes right at the top that are showing up in red. If I click them, it zooms right to that spot within the graph. And you might be saying, Aaron, these only take 1 millisecond to run or 2 milliseconds to run. But imagine you're running inference over a stack of 1,000 photos that your user has taken out in the wild, that really adds up. So the ability to optimize this is really important. Model Explorer also comes with many of the convenience features that you would expect. For example, I can bookmark this view and return to it later. I can export this view as a PNG. I can flatten all of the layers in the graph, which can be particularly useful for sequential models like this. I can overlay all sorts of node data, including tensor shape on the edges. And I can also search for specific nodes and see the data about those notes. So that's just one high level look at a specific way to use Model Explorer. But I'm sure you can already think of many more. So let's leave Colab now and talk about how to get it running for your models. So Model Explorer is available as a Python package. And it's designed to be used in either of two ways. One option, as you just saw, is inside Colab. We know that so many of you love using Colab as your starting point when working with ML. So we made it super simple to plug Model Explorer into your workflow. Another option is to simply run it as a standalone tool on your local machine. Just install it from pip, run the startup command, and Model Explorer opens in your browser. This approach lets you work quickly and easily with models on your local file system. And you heard from Cormac about Google's growing commitment to a multi-framework ecosystem. So I'm pleased to say that Model Explorer supports a range of model formats that are generated by a variety of frameworks. Whether you start with TensorFlow, PyTorch, Jax, or any other framework, as long as you can export your model into one of the formats up on here on the screen, you can view it in Model Explorer. We see this is just the beginning. And we're excited to add more comprehensive benchmarking and debugging features in the future. So we are so grateful that you've chosen to spend this time with us. To recap, Sachin kicked us off by talking about the power of on-device ML to unlock low latency, highly private, offline use cases without server costs. Cormac then discussed how Google is bringing the power of LLMs on device to Android, Chrome, and iOS via simple, yet powerful APIs. Then he talked about how Google is continuing its investment in TensorFlow and Jax while embracing PyTorch as well. Finally, I showed you how Model Explorer makes converting, quantizing, and optimizing on-device models easier and more enjoyable. All of this is now available as part of Google AI Edge. Be sure to visit the link on the screen, where you can find more information about all of these launches. And finally, let me close by saying, we are thrilled that you're on this journey with us. And we're excited to build this future with you together. Thank you. [MUSIC PLAYING]
Info
Channel: Google for Developers
Views: 5,280
Rating: undefined out of 5
Keywords: Google, developers, pr_pr: Google I/O;, ct:Event - Technical Session;, ct:Stack - AI;, ct:Stack - Mobile;
Id: uWCX1h9YamI
Channel Id: undefined
Length: 38min 19sec (2299 seconds)
Published: Thu May 16 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.