What's new in Google AI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] JOSH GORDON: All right. [CHEERS AND APPLAUSE] Hi, everyone. Thank you so much for coming. So my name is Josh Gordon. I work on the Gemini API, and I have an amazing team that builds a lot of the examples and the SDKs that you'll see today. JOANA CARRASQUEIRA: Hi, everyone, and I'm Joana Carrasqueira, a Senior Manager in Developer Relations for AI here at Google. And I'm really excited to be with you all today. LAURENCE MORONEY: Woo, Hi, everybody, and I'm Laurence. I just do a whole bunch of AI stuff here at Google, so thank you all for being here today. And we hope we have a great session for you. JOSH GORDON: So I think we might end-- LAURENCE MORONEY: Maybe having-- JOSH GORDON: In addition to the three of us, we have a special guest. JOANA CARRASQUEIRA: Oh, do we have a special guest? JOSH GORDON: Yes, we have a special guest. LAURENCE MORONEY: Apparently. JOANA CARRASQUEIRA: Oh, that's very interesting. I thought that we have a very special audience today. [CHEERS AND APPLAUSE] LAURENCE MORONEY: Hey, Sundar, welcome. SUNDAR PICHAI: How are you? JOSH GORDON: How are you. JOANA CARRASQUEIRA: Welcome, Sundar. JOSH GORDON: Thank you so much for joining, hi. SUNDAR PICHAI: All right. JOANA CARRASQUEIRA: Thank you for joining us. SUNDAR PICHAI: I get to crash the party. JOSH GORDON: Yeah. SUNDAR PICHAI: You guys had fun today? AUDIENCE: Yeah. SUNDAR PICHAI: It's an exciting time in the field. Yeah, there's a lot going on, yeah. JOSH GORDON: Thank you so much for joining. SUNDAR PICHAI: Yeah, well. JOSH GORDON: So to kick things off, I have a question for you. SUNDAR PICHAI: All right. JOSH GORDON: So AI has changed a lot recently, and it's becoming so much more accessible. So what kinds of cool things can developers do today with Gemini that a few years ago, would have taken an engineering team or a research team or tons of work? SUNDAR PICHAI: I think you all are doing it already. I've seen crazy examples online. It always blows me when people think about stuff before we do, but I think it's going to take us a while for us to internalize what multi-modality means. The fact that anything in any input can come out as any output, and you can mix and match and do things. I think it's a powerful new thing. We just all of us are putting the plumbing in there, but I think you get to use it and think through in a big way. Obviously, long context, something as we make it more and more easy to use, we introduce the caching API coming soon. As we bring the latency and cost down, I think that's another dimension on which it's pretty powerful. And finally, as things become more agentic, I think, it's an extraordinary opportunity to push the boundaries on. But I'm always blown away by what people do, rather than me telling them what to do. LAURENCE MORONEY: So we have a lot of developers in the room. I think. [CHEERS] So we got a few. So speaking of developers, Sundar, how do you see the developer role changing in this new AI world? SUNDAR PICHAI: Well, the pace is pretty fast. But you're also getting new tools to go with it. So, embracing AI in your workflows more natively is going to be important, I think. But also, I would say, you have to challenge the existing assumptions across everything you do. When I spoke about multimodality, that's what I meant. You're just really internalizing it. I think it takes time to internalize that you can actually go from any input to any output in a deep way. So I would just say, internalizing what it means to be AI native. I think it's going to take some time. When mobile came, we all went through the same thing. Most of us did what was in the web, and we kind of shoved it into mobile. And then people started doing really mobile native applications. I think we are in the same phase with AI. We're kind of adding some AI capabilities to existing applications. But really, stepping back and rethinking it from the ground up. I think that's the most important thing to do. JOANA CARRASQUEIRA: That's really good. And I would like to follow up on something that you've just said. It feels like the pace of change is really fast in AI. So if we take a moment to pause, what opportunities are you excited about in the longer term? SUNDAR PICHAI: We are obviously talking about the technology horizontally. But I think people taking the chance to take verticals by vertical and creating applications within it. You see examples of it, like in health care. What we are doing with AlphaFold And as we introduce Med Gemini models, what's possible. You see it with learning, with Learn LM. But the chance to take each vertical and go deep and solve problems using AI, I think, that's the opportunity ahead. Kind of horizontal applications are harder. But I think, the chance to do it on a vertical-by-vertical basis and go deep and solve a problem, I think, there's a lot of problems to be solved and a lot of value to be created. JOANA CARRASQUEIRA: Thank you so much. LAURENCE MORONEY: Thank you, and before Sundar goes, do you mind if we take a quick selfie with you and several hundred of our close friends. SUNDAR PICHAI: All right, awesome. [CHEERS] LAURENCE MORONEY: Or it's an ussy. JOSH GORDON: Yeah. LAURENCE MORONEY: Josh, we can't see you, buddy. OK, one, two, three. All right, thank you. JOSH GORDON: Thank you so much. JOANA CARRASQUEIRA: Thank you, Sundar. LAURENCE MORONEY: Thanks a lot. JOANA CARRASQUEIRA: Thank you, everybody. Thank you, Sundar. LAURENCE MORONEY: We also have a little gift for you. This is the Indian copy of my book, Josh, and-- JOANA CARRASQUEIRA: It really is an exciting time to be a developer. And that's why we call today. SUNDAR PICHAI: Take care. JOSH GORDON: Thank you so much. JOANA CARRASQUEIRA: Thank you, Sundar. LAURENCE MORONEY: Thanks. [APPLAUSE] JOSH GORDON: OK, awesome. So thanks again for coming. So to kick it off, I will speak about the Gemini API and Google AI Studio, and then Joana will speak about AI frameworks, and Lawrence will speak about Google AI Edge. And so the Gemini API and Google AI Studio. And I have a lot of cool examples to show you. I'm going to move a little bit quick. You've heard a lot about Gemini 1.5 Pro today. The thing that I would like to say, to kick this off, is I've been working in AI now for the last 20 years. And at no point in my life did I expect to see a model as cool as Gemini. There's two things that make it special. One is it's multimodal, which means it can work with images, text, code, and video, and audio right off the bat. The other thing is the long context. So in a single prompt, you can include one hour of video, 9.5 hours of audio, about 1,000 pages of text or about 3600 images. And this means that you can reason across a huge amount of data in a single prompt. And it's just absolutely extraordinary. So you've heard a lot about this model, but what I want to show you, is how easy it is to get started with just a few lines of code. And I have a couple quick examples to show you, most of which we just pushed to GitHub this morning, so you can try it out right off the bat. First, I'll show you a quick example with videos, one with PDFs demonstrating the long context, and then one with audio. Well, we'll get there in one second. And then I have one more example using code. This doesn't use the long context, but it's super cool just for developers, and I think you'll have a lot of fun with it. So this is a clip from a 30-minute video of the American Museum of Natural History. And what it is, is it's someone walking around with a camcorder. And they go through about a third of the museum. And they have these beautiful videos of all these great exhibits that have tons and tons of detail. And I just want to show you, I'm not showing this live, but you can do something extremely similar on GitHub with code that I'll show you in a sec. And so now that we have this video, how can we work with Gemini 1.5 Pro? And we have two tools that you can use. The first is our user interface. This is Google AI Studio. And it's a place where you can prototype your prompts. And it's much more than a playground. You can also tune models and do cool stuff like that. But basically, in Google AI Studio, just in your browser, you can insert this video. And what's cool, this is a 30-minute video. I mentioned the context length is a million tokens. This takes about half the context. So we can do about an hour. And with the models that you heard about today, you can go up to two million tokens when you're off the waitlist. So something simple You? can do, just to start, is you can do something like video summarization. And this is a really powerful and important thing, but it's not the cool thing that I want to show you. So you can summarize the video, and Gemini will give you a quick sentence or sometimes it'll give you a paragraph talking about the different exhibits that it saw. You can do much more powerful things too. And so this is a prompt where you upload a map of the museum. And so now you're working with a video and a map. And you can ask Gemini, name something on the map that we didn't see on the tour. And like all large language models, this won't always work perfectly, but it works well a lot of the time, and this is just such a cool capability. So it can reason across the map and say that we didn't see the Rose Center for Earth and Space. Other things you can do, and this is starting to work pretty well, is you can actually upload a drawing like this. And so this is a drawing a friend of mine made for a geode. And we can upload the geode into Google AI Studio, and say, where did we see something on the tour that looks like this? And Gemini says at 29:40 in the video. And if you flip to that in the video, you can actually see that it found the geode. And so this is just an awesome, awesome, awesome thing. It's super, super cool. So how does one build something like this? And the good news is it takes about six lines of code. So I have two links for you. One, we have a Gemini API Cookbook. And this is basically just a GitHub repo that has a bunch of examples that you can run with a couple of clicks. And then on ai.google.dev, we've got really great developer docs that explain things in more detail for you. So I'm not going to show you like-- well, this is most of the code. I'm not going to read it to you, but basically, the Gemini API is a REST API, but we have SDKs for Python, Node, and a bunch of other great languages. Here we're installing the Python SDK. Then you import google.generativeai, you configure it with your API key, and you can get an API key in Google AI Studio with a single click in the cookbook. There are step by step instructions that will help you get started. And then it's just a few lines of code. And this is actually almost our most complicated code because we're working with videos. So it's one line of code to upload your video, just because it's kind of ridiculous to try and send that whole thing over an HTTP request. So instead we upload the video first. There's some pre-processing that happens in the back end, so we sleep until the video is ready. Create a model, which is Gemini 1.5 Pro, and now we have a prompt, which is summarize the video. And what's really cool, is we've tried to make it really easy for you to work with multimedia. So now if you wanted to do something with the map, basically, it's the same pattern. Except what you do in your prompt, is you're passing a list. So there's text, there's video, and there's a map. And that's basically all the code you need to get rolling. I also want to quickly show you how easy it is to work with really long files, text files, and PDFs. So you can try this example end to end in the cookbook. It works pretty well out of the box. Basically, what we're looking at here, this is a 400-page PDF. And it's a transcript of the Apollo 11 mission, and that's from the first time humans landed on the moon. And basically, we're uploading the PDF to AI Studio. And we write a little prompt, find four lighthearted moments in this PDF. And what's happening here is, Gemini has read the PDF, which again, is 400 pages. And it's finding some humor in the transcript. And one thing that's really cool is, I noticed that in this example, it actually cited the page in the PDF where it got this example. And then what I did is, I flipped to the PDF. And you can see that Gemini found exactly the sentence, and it cited it correctly where it appeared. So here Michael Collins is making a joke about munching sandwiches. So you can work with huge text files. And this has endless applications. We can do a whole talk on it. Especially for search, and there's tons of great stuff in healthcare. And what I wanted to show you, really, really quickly, is the code basically looks exactly the same. So again, it's just a few lines. What we're doing is we're uploading our text file, creating a model, writing a prompt, and we're just pass a list with our text and our text file. And of course, you can include images and audio too. Another example, we have in the cookbook that you can try. We have a 44-minute speech from JFK. And this is the 1961 State of the Union. And this works pretty well out of the box. Here we can do audio summarization. And there's some cool new examples in the cookbook too, for things like voice memos and stuff like that. But anyway, you can upload the file to Gemini and Google AI Studio, and here we're just doing a quick summarization to kick it off. And it will summarize this speech. And again, the code looks, basically, exactly the same. So create the model, upload your audio file, and then you can go ahead and prompt with it. One last example to show you really quickly, and this is more for developers. So this one is a little bit more technical. So the idea here is that you have some Python code and say you're doing code for something like home automation and maybe you have something like lightbot. So you can control your lights. You can turn your lights. We have three Python functions. We've got one to turn them on, we've got one to turn them off, and we have one to set the color. And in this case, we intentionally wrote this in a little bit of a complicated way. So to set the color, as a parameter, it takes this RGB hex value. And now what's cool is we want to see if we can call these functions using Gemini. And so what we're going to do, is we're creating a list of functions. So before we were working with media files, but now we're creating a list of functions. And we're writing a system instruction to configure the model, to tell that it's a lighting bot, and there's different things you can do with light. And now, when we create our model in code, we're including a list of the functions and the system instruction. And now, if we send Gemini a message, if we say, light this place up, the output is the function that it would like to call. So it understands the functions and can map natural language to the function. And what's really cool, too, if we say something more complicated, like, make this place purple, Gemini will actually figure out, not only does it have to call the function to set the light color, but it figures out the hex value that it should pass to make these functions purple. And in the Python SDK, we actually have a parameter you can set called automatic function calling. And so in addition to just seeing what Gemini would like to do, Python will actually execute that function for you. And you can use this for a whole bunch of awesome home automation stuff for anything you can imagine working with code. So that's a really quick, rapid tour of a couple new examples we have. You can find most of these in the cookbook. You can get started with just a couple of clicks. The cookbook is for Python, but we have great SDKS, which you should check out for Node, Go, Dart, Flutter, Android and Swift. And yeah, thank you so much for my really fast talk, listening. And I hope you have fun with this. Rock and roll. [APPLAUSE] And now, Joana. JOANA CARRASQUEIRA: Thank you so much. Josh. Well, and as you could see, with prompt-driven development, AI is literally becoming more accessible to everyone, regardless of technical background. And it's been amazing to see how this technology has been gaining momentum over the years and how so many of these models, they power some of the amazing products that we have here at Google. And more models have emerged very quickly from T5 to Lamda to PaLM, PaLM 2, Gemini and more recently, the Gemma open models family. So really, [APPLAUSE] It looks like we have some Gemma models fans. OK, there's a Gemma talk tomorrow that you have to attend. So like I was saying, when we think about large models, it's not just about a single architecture. There are advancements across software, hardware, and the people behind these technologies that we really have to think about when we think about the holistic development of AI. So when we think about these advancements, about beyond architecture, there are four main things that we always have to consider. And they are computational power, machine learning techniques, training data, and access to innovation. So we want to empower every developer to be more productive with AI. We want to take it even further. And we want to make sure that you are productive in your day-to-day life by providing you with, just to mention a few tools for debate, for debugging, code generate, auto assist, testing vulnerabilities. But really, Gemini, as you could see during Josh's presentation, is your best partner. And as we saw, Gemini 1.5 Pro, really offers the world's largest context window and allows you to process vast amounts of information in one single stream. So as developers become more productive and as the models evolve, we are able to unlock amazing new scientific discoveries and innovations. And that's why I'm really excited to bring you AlphaFold 3. AlphaFold 3 is a model that is a combination of innovation and research from Google, DeepMind and Isomorphic Labs. And this model is really special because it's a state-of-the-art model that allows you to predict structures in all life's molecules, from proteins to DNA to RNA. And we believe that we're going to unlock so many scientific advancements in drug development, drug design, but ultimately, this will bring so, so much positive-- so many positive health outcomes for patients and communities around the world. AI is also a catalyst for innovation. And you can really be more collaborative, as a team, by using AI. You can take it from a simple idea. You can make it better. You can brainstorm, iterate, refine it, and really start with something really simple and make it more complex and better over time. So there are two tools that I'm really excited about. One is AI Flutter Code Generator, which allows you to riff off on UI designs with your developer and design teams, just from one single text prompt and Data Agent, which allows non-coders to also have a better understanding of data, about the relationships between data, but also have a very deep understanding how it works. Now, you might be thinking, OK, things are moving so fast. I'm feeling a lot of pressure as a developer to really build better, faster, smarter apps. So how do I transform all these challenges into opportunities? And this is a mindset change, and that's why I really want you to think about the opportunities that you have to build new cool innovations with AI. So I'm going to break down these opportunities into two main themes one about creating models, and another one about consuming models. So let's start by talking about creating models. And the key to succeed, in any complex task, and let's face it, there are not many tasks that are more complex than understanding your data, training your model with that data, and also using that intelligence to build apps with a very good user experience. But like I was saying, the key to succeed in a complex task, is really by having a stack of technologies with clear separation of layers by functionality and optionality between those layers. So when it comes to building with AI, we've been working tirelessly. And I can assure you that, to ensure that you have the right stack to build with. So let's start at the top, shall we? If you are a developer, in the age of AI, there are probably three main things that you're trying to do with your models. First one, you might be trying to create a model from scratch. And you will need a consistent API if you want something that is solid, easy to learn, but also easy to maintain, whether it's a deep neural network or a wide LSTM. Maybe you'll be training your model using parts of an existing model, and that can save you a lot of time and help you build a better model. Or what's becoming more common, maybe you're trying to fine tune a generative model with techniques like [? Lorem. ?] And in just about a moment, Laurence is going to do a demo for you. However, regardless of what you're trying to use, your model for, Keras is your best friend. But your job really only starts at coding. And once you've defined your neural network architecture, and once you've chosen the layers to fine tune, something still has to do the heavy lifting of machine learning. And optionality is really important here, and that's why you've been listening about optionality so much today. But maybe you have access to state-of-the-art hardware infrastructure, or maybe you're sharing a GPU with your team. What is really important, is that you optimize your hardware. So that's where Keras, with optional backends like TensorFlow, Jax, and Pytorch, is really important and will make your life a lot easier. We're all familiar and we all love TensorFlow. We have a very big TensorFlow community. Developers still use it quite a lot. But I also want to call out Jax because Jax is a framework for accelerated computing. And when combined with Keras on the top end, you can really take advantage of this acceleration without having to change your code. So whether you're in a compute-rich environment or not, you can experiment with it and see if it can really train it faster with Jax. And of course, you will need a place to run your models. We need to put it in people's hands. So with that in mind, the ecosystem of runtimes that we've been working on for years, will work with your models on this stack, no matter the backend. Everything from the smallest microcontroller, to mobile, to the browser, to web servers, to accelerated infrastructure has the ability to execute the models that you train on this stack. And this really provides an amazing level of optionality. So investing your coding time in Keras, it really opens so many opportunities for you as a developer. Lastly, I just wanted to mention PaliGemma is here and is designed for world-class, fine-tuned performance on a wide range of vision-language tasks. We are so excited about what you're going to build with the Gemma open family. And so we look forward to seeing what you are going to create with it. But like I promised, Laurence is going to join us on stage. And he's going to demonstrate what I've just talked about. LAURENCE MORONEY: Thanks, Joanna. [APPLAUSE] Hello, everybody. I'm going to first explain a little bit about what I'm going to do. Now, how many of us here in this room, out of interest, do startups, or you're forming your own startup, all of this kind of thing. Great, it's maybe 40%, as far as I can see. Now, think about when you are starting a startup, and you're creating this whole new thing. And a lot of the time, you really want to just test out your ideas, you want to validate your ideas. You want to get it in front of people so that when you go to investors with a pitch, you've really tested and validated that. And that's one of the areas that, with generative AI, where I'm particularly excited. Because so many ideas that are out there, you can just start kind of kicking the tires on these ideas using synthetic data that has been created by something like Gemini. And then you can start building an app and start building an app idea around that. And if you were at the developer keynote earlier on, Sharbani and I were kidding around about one of the ideas we had. And that was, well, hey, we have kids. Our kids love to read books, but sometimes it's very difficult for us to find the right books for children to read. You can go on to the sites where you buy them and maybe read reviews. You can go on to sites like Goodreads and maybe read reviews. But that's a massive cognitive load that you have. And with that cognitive load, I mean, it just makes it harder. Wouldn't it be nice if you could go to a chatbot and just ask a chatbot about it? So then if we were a startup like publishing books or providing an app like this, hey, that would be really cool to do that to allow people to have a chat application where they could ask about the books, and then maybe use Rag with details about their own family, that they could pass through this application, keeping it all private and on device, so they could get even more intelligent reasoning around that. And we thought, that'd be an amazing idea for a startup. But how would somebody get started with that? You need data. So I just got signed out of my laptop, so let me sign back in. And if we can switch to the laptop screen, please. So then the idea is, just with something like Gemini-- let me zoom in on that, so you can see it a little better --I could start just doing things like I put in a simple prompt in AI Studio, where I asked it for-- oh, here's me just saying, give me 100 [INAUDIBLE]. Like, please help me create a synthetic data set. Output it as a CSV containing details on family-friendly books. Output the title, the genre, the theme, and a simple synopsis. Create about 100 books. And then what Gemini did for me was, it started creating books like "The Big Friendly Giants," "The Paper Bag Princess" and stuff like that. So now I'm actually starting to put together a catalog of books using synthetic data, and AI Studio just allowed me to do this as a simple prompt and get that as a CSV. So now I have a starting basis. I have a database of books that I can start working with, but it gave me a very simple synopsis. For example, I don't know, "Tiki Trouble," then I'm just reading here was a mischievous imp causes chaos for a young girl named Ella. I can't really tell a lot about that book from that. So maybe I could go back to Gemini and have Gemini create detailed synopses of the books. So that's what I ended up doing using the generative AI that Josh was showing earlier on. And I'm just going to show it very briefly here. But what I could do is-- and I can share this Colab with you later if you like. But what I could do is, using the generative AI model-- sorry, the generative AI API that Josh was showing earlier on, I could then create a new prompt that's like, please write a detailed synopsis of a plot for a book called something. When I'm reading through that CSV, add the title, in the genre of add the genre, which has a theme of add the theme. And this basic plot, add the synopsis. And now Gemini is going to write 200-word summaries of the books for me. So now I'm getting lots and lots of data around these synthetic books. So now as a startup who's looking to get into this area, I now have all of this data that I can work from. I didn't need to buy a database. And so I can start proving out this concept and building something, again, that I could bring to investors. Now I have the data, what do I do with it? Well, tell Lora we love her, So it's with Lora, low-rank adaptation, we can now take a model like Gemma. And we can start fine tuning Gemma with instruction tuning on that data. And I'm going to just show, like-- any Keras TensorFlow developers in the house? OK, quite a few of you. So all of those skills that you've had with using that, it's just the same thing that you're going to be doing here. So when I come down to this part where I'm fine tuning Gemma, hopefully, my VM is still running, I'm running it. And we'll see right here, is a good old-fashioned-- If I zoom in, we'll see a good old-fashioned model.fit. So all I've done, is I've taken the data, my synthetic data. I did a little bit of code that turns that into instruction tune format. If you're not familiar with instruction tuning, the idea is that when you're working with a large language model, like a Gemini, you just give it the data in the way that you would expect it to give you back data. So if you give it a prompt, and you expect it back in a certain way, then you just create that, and you create lots of instances of that. For example, the prompt could be, is this book suitable for a 10 year old? And the answer is, yes, it is suitable for a 10-year-old. You put those prompts together for those specific books, create all of that data, which I've done synthetically here, and then train a model on that. And we can see here now it's, like, it's going to take a little while to train. The first epoch was done in about 43 seconds. The subsequent epochs in Keras are usually a bit faster, but the 20 of these is probably going to take about 10, 15 minutes. So while that's training, and the learning leprechauns are doing their thing, can we switch back to the slides, please? So we've been learning, and we've been hearing about the word AI all day today. I'm trying to avoid using the word AI because everybody else uses it a lot. But we've been hearing a lot about models that you can create today. But one of the biggest things that developers, when we speak to them, get confused about, is where do I use my models, which ones do I do, where do I put them, do I use hosted Gemini, do I fine tune my own model, do I use something like an open source like lama, where do I start? This is all really confusing. So I always like to think about it in the terms of the number three. So if you think about where your models are going to execute is a spectrum. From one end of the spectrum, I'm going to be talking in detail about in a moment, but that is a model that you completely own that's on your own device. Be that a server, be that a mobile, be that a desktop. At the other end of the spectrum, is a model that you do not own and is hosted by somebody else. And that's like a Gemini being hosted by Google. That's like a GPT being hosted by OpenAI. And you access that via an API. Increasingly important, and you're going to be seeing this as a major growth area over the next 12 months, is in the middle of this, where it's a model that you own, that you create, that you fine tune, that then gets hosted, potentially, by somebody else on your behalf. It could be hosted by Google and Vertex, and there's a whole crop of startups starting up to actually do that hosting for you. I think, personally, this is going to be the most exciting area for us as developers because the idea is, if we can start training our own models or fine tuning existing open models to be able to solve our business problems, now we have a place where we can host them. So today, you're seeing a lot about Gemini being hosted in the Cloud for you. You're seeing a lot about Google AI Ads that I'm going to be talking about in a moment. But as a developer, also think about investing your skills in that area and think about what it takes to fine tune models. And as we announced in the keynote this morning, Gemma 2, part of the idea of that is it's going to be a 27 to 30 billion parameter model. But it's going to fit on a single TPU. And I think that's one of the important parts of it. So now if you're going to be hosting this for your business, that single TPU, the cost of that single TPU to be able to host a model that you have fine tuned explicitly for your use case, then becomes a really powerful thing. But I'm going to switch to the mobile side of things for a moment. And like the whole idea is that when you run want to run a model on a device, It's not easy. So devices are generally lower powered. You don't have all of the hardware that you have in a data center. And all of that kind of stuff. So typically, and for years, many of us have been doing this using a technology called TensorFlow Lite. You build your model, you convert it into the TensorFlow Lite Flatbuffer, and then you run it on TensorFlow Lite. In the last couple of years, at I/O and other events, we've been talking a lot about the MediaPipe Framework and MediaPipe Tasks. The idea behind those two, if I show them on the next slide, is really they abstract a lot of the complexity of having a model on your device. Now, if you're an Android developer or an iOS developer or a web developer, you probably think in terms of strings and bitmaps and jpegs. If you're an AI developer, you generally tend to think in terms of tensors. Models are tensors in and tensors out, and it becomes a difficult task, sometimes, for you, if you are not a TensorFlow developer, and that's where the name TensorFlow came from, if you're not a TensorFlow developer to start thinking about your data structures, how your app does things, and to be able to take advantage of a trained model, to be able to make your app artificially intelligent. And you end up with complex workflows like these ones. The one on the left is just a single model. The ones on the right is for many, many scenarios like segmentation, object detection, that kind of thing. You've got multiple models. And you end up having to write all of this code for doing things like preprocessing, post-processing, on each of these models and managing it all. And it's really, really difficult. And that was the idea of what MediaPipe and the MediaPipe Framework were giving you. The idea was that we abstract a lot of that complexity. You write your code to just deal in strings and bitmaps and whatnot. But we did get a lot of feedback from the community that it's still very confusing. When do I use TensorFlow Lite? When do I use MediaPipe? When do I use Gemini? When do I use Gemma? All of these kind of things, and we understand that it's confusing. So, as a mobile developer, if you're using Gemini, you're going to be using an API to call the hosted Gemini, or you could be using an API to call a Gemma model that runs on device, that I'll show in a moment. Or you could be creating your own model and running on TensorFlow Lite. Or you could be creating your own model and running on MediaPipe. And there's all of this kind of confusion. So what the folks have been doing working hard, is to create this overall umbrella that we call Google AI Edge. So you'll have a one-stop shop for all of the innovations that are happening on the Edge in mobile to try and reduce the confusion, particularly around getting started in that stuff. And as part of that, there's a few new things that have actually been released. LLMs on device, which I'm going to show in a moment with my fine-tuned Gemma, where the idea was, again, all of that complexity. Because when you start dealing with an LLM, it's not as simple as you give it a prompt, and it gives you an answer. Your prompt needs to be tokenized. The answers coming out of the model are generally serialized, and then they need to be detokenized. And there's a lot of work that needs to be done to do that. But the MediaPipe folks have actually built wrappers for LLMs like Gemma to make that work a lot easier. I'm going to show that in a moment. We've heard so much from the community about PyTorch support. So now if you build models in PyTorch, they can be converted into the Flatbuffer format that will run on Android or iOS or even the web using our TensorFlow Lite runtime. And then finally, something you'll be seeing a lot of in the AI Edge session, is something called the Model Explorer. And one of the things we always hear from developers is when you get a model, and the model is doing something, and it's doing some kind of reasoning, or it's doing some kind of an inference, how can I trace what it's doing? How do I understand what it is in that model that gives me this answer instead of this answer? And that's the idea behind the Model Explorer. It's freshly baked. It's hot off the presses. It was showing it to the public for the first time today. You should be seeing it in some of the demos. And it's going to be available under Google AI Edge. And I hope it solves many of your use cases when it comes to understanding and communicating how your models are actually going to work. So I'm going to switch back to mobile for a moment if we can go back to the demo machine. And I cheated a little bit. I have a version of the model that I created earlier because we're short on time. And I'm just going to run it on my phone. It's a live demo on my phone. So here it's now loading-- oops, the app actually crashed, so let me do that again. So I'm just going to load up the app. And you'll see the app is just a very basic chat-type app, where what I've done is I've fine tuned a Gemma model on to become an expert system in children's books. And now we can see. And I'm sorry if it's a little small. And now I can just go to it. Now, for example, would "The Time-Traveling Catacorn," which is a book that we made up completely for this demo, but I think I'm going to write it, be a suitable book for my family? And. If you look carefully-- oh, It crashed, sorry. Oh, you gotta love live demos. So let me try that again. Good thing Sundar is not here. [LAUGHTER] Would "The Time-Traveling Catacorn" be a good book for my family? OK, everybody, cross your fingers. There we go, OK. So if you look carefully, you'll see the prompt. I have a system prompt where I have the instruction where it's saying, below is an instruction that describes the task. You're a helpful bot. You want to understand family books. The Rag part of it is the context, where I have data that's on my device that I don't want anybody else to know about. And it could be personal details about this is what my daughter likes or my son likes. In this case, I just said my family really enjoys books with interesting plots. And then, finally, is the actual prompt, Would "The Time-Traveling Catacorn" be good for my family? And it looks like the model actually froze in its output. Trust me, it does work. But the idea here is that this-- Gemma will give me that output. Let me try again. I'll just say, tell me more. Tell me more, tell me more. I'm sorry. You know what? I'm going to skip back to the slides. I'll demo this later. Sorry. It's the kind of thing that works always in dry run, but then when you do it on stage, it fails. Can we go back to the slides, please? OK, so to wrap up, there's a few things that I do want to reemphasize here. First of all, AI is human centric. I hear a lot of feedback from people saying that AI is going to take away our jobs as developers. When we see code generation, for example, it's going to take away our jobs. I'm just going to say, completely untrue. This is a greater opportunity for developers than there ever has been. It's a greater opportunity for any kind of creators than there ever has been. The ability for you to be able to expand your horizons, as a developer, is greater now than ever. I really think there has never been a better time. One of the next trends that you're going to see is agentic workflows. I promise my friends I wouldn't say the word agentic, but I just said it. But what you're going to start seeing, in the next 12 months, as a developer, is the ability for you to create apps that use LLMs that self-correct, that self-identify, and are able to start interacting with humans or with each other as part of workflows. So this is one of the areas that I would strongly recommend to skill up in to start looking at using things the Gemini API to be able to build these types of things. I don't know if you saw the keynote this morning where they demonstrated Chip, where chip was the idea was, it's a helpful assistant that works within your Google Docs and works within your Google Chat. And I you find this is a great example of an agentic workflow. Another thing that we recommend is to join the Kaggle community. As you create your own fine-tuned Gemma models, hopefully ones that are better than my one, you're actually going to be able to publish them to the Kaggle community, and to share them with other people so that they can use them themselves. And there's just lots of great stuff happening in that space. Finally, the last one I would like to say is AI is accessible. We've been working really hard to widen access to everybody, to make AI as easy as possible so that everybody can jump on the AI train and be able to build better apps and be able to build better sites and services with it. Also, just with the last minute I have, just to encourage everybody to remember that responsibility in AI is very, very important. It's a very difficult thing to do. One of the things we've been stressing at Google, is to try to be a thought leader. We're responsible AI by publishing responsible AI principles, and I'd encourage you to read them and follow them, or adapt them, really, for your own use. And then, finally, I'd just like to leave you. That question is, there's the opportunity like never before to change the world with your work as a developer, not with AI, like this slide says, it's with your work as a developer. There are opportunities out there to make the world a better place for other people. I encourage you to go and take it and just want to say thank you so much. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Google for Developers
Views: 14,367
Rating: undefined out of 5
Keywords: Google, developers, pr_pr: Google I/O;, ct:Event - PA Keynote;, ct:Stack - AI;
Id: fH4xqeu7GT0
Channel Id: undefined
Length: 41min 52sec (2512 seconds)
Published: Thu May 16 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.