AI development assistance - Android Developers Backstage

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] [TELEPHONE RINGING] TOR NORBYE: Hello, and welcome to Android Developers Backstage. I'm Tor Norbye from the Android Studio team. ROMAIN GUY: I'm Romain Guy from the Android Toolkit team. KATHY KOREVEC: And I'm Kathy Korevec. I'm from the Labs team. TOR NORBYE: Hey, we're expanding beyond the Android team for the guest pool. That's kind of cool. ROMAIN GUY: Yes, what happened here? Like, how did that happen? KATHY KOREVEC: I'm invading. TOR NORBYE: Well, actually, Kathy and her team has been collaborating with us on a little feature known as Gemini in Android Studio. ROMAIN GUY: Oh, is that the name now? TOR NORBYE: That is the official name for Studio Bot. Rest in peace. So this is actually the team that is providing the AI smarts. KATHY KOREVEC: Yeah. TOR NORBYE: Behind the features that we have in Studio. KATHY KOREVEC: Yeah, I'm super happy to be here. So we have been working behind the scenes a lot on fine tuning, basically, the Gemini models to be a lot more efficient with code. And so we take Gemini models, we work really closely with the team building those models, and then we do a lot more fine tuning and training with code, with Android Studio Code, and try to make it a lot more efficient. ROMAIN GUY: Wait, so you're saying that Tor's team have not done any of the hard work? They've done the easy bit? They did the UI? KATHY KOREVEC: I don't know. I don't know about that. The UI-- so you're talking to a product person too. I'm a born-- and I will die a product person, although, I do a lot of tinkering with engineering in the background. But I strongly believe that the intelligence my team provides means absolutely nothing unless we actually integrate it into the products. So I'm very, very thankful for Tor and his team to be able to do that. TOR NORBYE: I think also, product people in developer-facing products have to be engineers at heart too. KATHY KOREVEC: Absolutely. TOR NORBYE: Because it's like, how do you design a product that's-- something like code completion. It doesn't really make sense unless you really understand what coding is and what people need. KATHY KOREVEC: Yeah, actually, I believe in this very hardcore. You have to-- actually, if you're going to be designing something, integrating something into, let's say, an IDE, you have to live and breathe those pain points, otherwise, it's going to just feel really clunky to developers. ROMAIN GUY: Yeah, and I think it's always something that we have the discussion regularly within the Android dev X team developer experience, it's a bit bizarre because we are both the authors and the target audience of the products. So sometimes, I think it's a good thing. Sometimes maybe we are a bit too close to the end product. KATHY KOREVEC: Yeah, totally. My analogy for this is chefs cooking for chefs. ROMAIN GUY: Yes. KATHY KOREVEC: And I constantly tease my team that I'm going to get them t-shirts that say, I am not the user, but in a way, you are. You just want to always think about the user as somebody else. You're not necessarily designing for yourself. You're designing for other developers. And the reason why I say chefs cooking for chefs is because developers know all of our tricks. If you want to hack your way through designing and producing a developer product, it probably will feel like you hacked your way through it to another developer. They're going to know, oh, that's a marketing gimmick, or they're not actually truthful about the latency, or I don't actually really know what the performance is on this thing. They'll be able to tell and weed out all of those things. On the flip side, so it can be very challenging to design developer tools. But on the flip side, developers are notoriously early adopters. They're also very, very good at giving feedback. So as long as you're OK with that, and you're OK working with them, you're going to make your product a lot better very fast. TOR NORBYE: It does feel like developers are the ones who are the most at the forefront of large language models and adopting. KATHY KOREVEC: Yeah. TOR NORBYE: You know, I think we see that even with traffic on Bard. It was just a lot of coding questions. And so developers are really kind of an important audience, I think, to win over. ROMAIN GUY: Well, there's also, I think, something interesting in developer tools, which I think plays right into the whole AI in the IDE thing, is you don't necessarily need to have a feature that does all the work on behalf of the developer. You're here to increase their productivity. So the way I see it is like, any second that I gain thanks to my tools, I'm happy, right? KATHY KOREVEC: Yeah. ROMAIN GUY: It's a second again. So even if it does only 5% of the work, 10%, 20%, whatever. KATHY KOREVEC: Yeah. ROMAIN GUY: It's very different if I'm trying to order food with an app and it doesn't show up. [LAUGHS] KATHY KOREVEC: Yeah, definitely. Then you're not eating dinner. ROMAIN GUY: Or it's dropped off like in a different town. Like, oh, you did half of the work. KATHY KOREVEC: Last night, true story, we ordered Thai food. And we got-- I think a pizza came instead. So I don't know if AI was involved with that, but we ended up making macaroni and cheese for dinner, so. TOR NORBYE: I don't know. I'd never say no to a pizza. KATHY KOREVEC: I felt that. I was like, somebody needs their pizza. So we actually gave it back. ROMAIN GUY: Unless there's pineapple on it. KATHY KOREVEC: Oh, OK, we can't go here, yeah. TOR NORBYE: I also love pineapple. All right, and send your comments too. KATHY KOREVEC: Yeah, yeah. Since you brought up productivity, I have an interesting perspective on this. I do think AI is helping with productivity, but I don't necessarily think that's all it can do, especially for developers. And when my approach to developer tools and building for developers is not necessarily how can we make developers more productive, or how can we increase character acceptance rate on code that is written by AI. Obviously, I'm interested in that kind of thing, but I am way more interested in how can we make developers more creative? And I think about tools like the IDE, or other developer tools like even your console as creative spaces for developers. They're building, they're creating at the end of the day. So I think that looking at it that way makes us empathize a little bit more in what developers are trying to do. It's not that they're just trying to take somebody else's design and code it. They're actually trying to express themselves as well. ROMAIN GUY: I'm glad you're saying that, because what I mean by productivity is exactly what you just described. That's the way we see, for instance, the libraries and framework. We write the IDEs that if you spend less time doing, you know, the foundational stuff, you can spend more time on your business, on whatever ideas you have, on all the extras that you would not otherwise have any time to work on. KATHY KOREVEC: Totally. ROMAIN GUY: There's also a huge like-- sorry-- but like in terms of APIs, very often, like, you know, we need to provide APIs that developers want and need. But sometimes, the IDs that we're going to give you something that you haven't asked for. And maybe something interesting will come out of it. KATHY KOREVEC: You mentioned APIs specifically. So one of the things that we've been exploring a lot on the AIDA team, and I think that it goes without saying, that an API from an LLM and APIs in the AI space are difficult to use right now. And if you look at it, like, developer tools are not just your console. I actually consider developer documentation part of your interfaces as well. It's also the API. And especially for-- I'm a front-end developer. I have never coded an Android application in my life, but I have used-- TOR NORBYE: Yet, yet. KATHY KOREVEC: Yeah, I know. I have used the tool. But it's for front-end development, in my experience, the APIs that I have access to are actually really hard. And a lot of that is because it's text in text out of an LLM. And we haven't gotten to the place where we're actually considering, OK, how should this JSON be styled? And what should it look like as it's coming out for a developer to interface with it and build their tools with it? And so we're looking at that on the AIDA team. Sorry, I don't know if I mentioned that. Specifically, I work on this team called AI Developer Assistance. And it's part of the Google Labs team. So on the AIDA team, we're looking at how can we actually design this API such that it can be better read, easier to read, the data output, easier to read, the endpoints, et cetera for better integration. And that's something that takes a lot of design thinking. ROMAIN GUY: Yes, yes, it's hard, but yeah, I see what you mean. I have a long background in graphics. And very often, some of the APIs are text, basically, send massive strings that contains code. And it's not fun to deal with that. KATHY KOREVEC: Yeah, yeah. TOR NORBYE: It could be worse. It could be Protos. ROMAIN GUY: Let's not start the discussion about Protos. TOR NORBYE: Yeah, I want to go back to the developer productivity thing again. I felt the same way. I think a lot of the focus a couple of years ago was on code completion, which is all about shaving off microseconds here, microseconds there, and that was good. But I think, you know, our initial launch for Studio Bot last year was really because Android developers are so concerned about sharing code, we instead doubled down on unblocking them when they're stuck. So it wasn't about saving your typing. It was about like, hey, how do I, you know, add-- how do I get location access? I don't know how to do that. And so instead of going to the web and reading through a bunch of blog posts with all kinds of backstories and randomness in there to sort of, you know, have you click on more ads, it was really about getting you the answer. And sometimes, it was like, hey, why did my app crash? And so we would integrate it directly into the crash Console. And like, OK, explain this Logcat. So it was, it was much more about not saving you a few seconds here and there. It was about saving you that 30 minutes. KATHY KOREVEC: Yeah, absolutely. TOR NORBYE: And that's what I'm really finding today. Whenever I'm going on a technology I'm not super familiar with, that's really where Gemini shines. You know, if I'm ever touching Python, which I don't really know, it's so helpful to me to have the-- ROMAIN GUY: I heard you complain about Python a lot lately. TOR NORBYE: Oh, I could go on for a while, but I shouldn't. I shouldn't. KATHY KOREVEC: You're going to be complaining about Python for a while, because that's the majority of AI right now. TOR NORBYE: Yeah, that's so frustrating. Give me a statically typed language please. Anyway, let's talk a bit about the features we have done together. KATHY KOREVEC: Sure. TOR NORBYE: So we launched chat last year, and we launched completion six months ago, which by the way, has gotten so good. You know, initially we were like, we held it back because we weren't super happy with sort of, you know, the results. And like, I think a few months ago, it just got incredibly good. KATHY KOREVEC: We've been working hard on it. TOR NORBYE: Yeah. And I don't remember-- one of the things that I think was a really big insight was that we actually switched to a smaller model, and it got better, which is very counterintuitive. You usually think that smaller model means worse results. But because it was so fast, it was just always there when we needed it. And that just made it even better. And so a small-- I don't know if I-- stop me if I already said this in the last podcast, but I had this amazing experience. I was flying back on a work trip, and I happened to have free Wi-Fi on the plane. And I was over the North Pole, and the Wi-Fi was terrible. I'm like, I'm not going to check mail anymore, but I hadn't actually logged out of the Wi-Fi. And I was coding in IntelliJ using our Studio Bot plugin in it. And code completion was working. And I was like, how is this possible? How is the bad-- how is the latency of this thing and the amount of data so small that this worked well? ROMAIN GUY: The real takeaway is that you should have been watching a movie. TOR NORBYE: Yes. KATHY KOREVEC: That's correct. Yeah, you should just put your laptop away when you're on the-- ROMAIN GUY: Yeah. TOR NORBYE: The time goes faster that way. So anyway, so chat and then completion. And the latest thing we just launched together was transformation. So can you talk a bit about that? KATHY KOREVEC: Bringing it back to the smaller model, I get this question a lot of, so this question of smaller model is not necessarily one of how do we make people faster, or how do we make them more productive? Absolutely, that's in there. But we actually had a much larger model for code completion. I think originally, we had a 12B model when we launched with you guys last year into Canary for a little while. And then we switched to a much smaller model for a couple of different reasons. We learned that-- so one of the things with chef cooking for chefs, you have to experience what it is like to code, what it's like to code with AI, with ghost text, and accepting that in order to really appreciate small model versus big model for code completion. And the reason why is because as you're coding, you're looking at what you're doing, you're looking at your keystrokes showing up on the screen, and you want everything to happen very quickly. So you want the latency to be low. We happened to ship a very good recipe that improved quality for smaller models, and so we capitalized on that to be able to bring the latency down and ship a smaller model for this specific use case. So we've also shipped code chat with you, which does-- which that's for something like when you're doing research. Like, I always forget how to configure my SSH. So I'm always like, OK, how do I do this? And then I go to the documentation sometimes for that specific platform. It'll say like, it'll be very easy for me to figure it out, sometimes not. So documentation is a very good use case in doing research when you're actually coding, when you're like, oh, I forgot how to do this one specific thing. You may be an advanced coder, but it's totally fine to just jump over into chat and say, how do I configure my SSH file for this? And then it'll take a little while to write it-- to answer that question, maybe produce some code for you that you can then take into your IDE. And the reason why there are two different size models, because you're doing two very different tasks as a developer. And so you can wait a little bit longer for chat to come back with a response, versus while you're actually coding, you want that to be really snappy and quick. ROMAIN GUY: I also assume that the needs are different, right? When you do the code completion, you only need to produce code. In the chat, you need to produce way more than that. And there's formatting. KATHY KOREVEC: And context and everything, yeah. One of the things that I think is missing from both-- and this is something that my team is working on. I assume this is something that folks in Open Source are working on, and all across the industry, is personalization. I hear it all the time. Even for code completion, personalization is something that developers are missing. And myself, I miss it a lot. Like I mentioned, I'm a front-end developer, so I write a lot of CSS and HTML. So I really like Tailwind as a CSS framework. So when I'm-- even with code completion or with code chat, when I'm getting answers, the model is constantly giving me answers for Vanilla CSS. And that means that I have to go through and actually reformat everything, even after I've accepted those recommendations. Is it really saving me time? I don't know. ROMAIN GUY: It's interesting, because one of the things I found with the AI autocompletion-- and I don't know about the Studio version, but sometimes, what it suggests is long enough that I feel like it's slowing me down, because now I have to read the ghost suggestion and be like, is that what I want? And when it is, it's great. When it's not, I'm like, oh, I just spent time reading this. That's not what I wanted. KATHY KOREVEC: Yeah. ROMAIN GUY: So yeah, it would be nice to be able to control some of this. KATHY KOREVEC: Yeah, and I think we'll get better and better at this over time, obviously. It's great that we have our technology in the hands of Android developers right now, because it means we can learn and improve things much more rapidly. ROMAIN GUY: And they will give you feedback. KATHY KOREVEC: Yes, absolutely. And we're getting it. TOR NORBYE: One of the first things we did is, that I think the AIDA model, the first one we got, had not really been tuned for Android. And so when you asked it like, hey, how do I do X, Y, Z, you get Python code back. KATHY KOREVEC: Yep. TOR NORBYE: Because I think that was a big part of your initial data set. ROMAIN GUY: I think there was a lot of Python in the original. TOR NORBYE: So the personalization we did, it wasn't exactly personalizing, but we were personalizing it for Studio. So we modified the prompt to say, you are an Android developer. Your preferred language is Kotlin. So we sort of gave it all these extra clues, which are part of the hidden prompt that users don't see. So when they're saying like, hey, what's your favorite programming language? It'll say oh, it's Kotlin, because it saw that in the prompt, And so that was the bit of the personalization we did. And then we ran into the next thing, which is, a lot of the recommendations were based on older things. So if you asked how to do some layout thing, it wouldn't use romance toolkit compose, it would use-- well, I guess it's still romance. The old-- ROMAIN GUY: I did not create either of those. TOR NORBYE: Yes. You know, and so basically, the next step was like, OK, if users have agreed to share context, let's tell the chat query which libraries you're using, at least the ones that are well-known so we're not leaking anything. So we can say, you're using material three, not material one, so that the code snippets come back. And so it's per your tailwind example, it'll be the same thing, that, you know, if it's integrated in IDE that can look at your project, it can say, oh, I'm a tailwind user. And so it's not going to start answering in node, or I'm going to totally reveal how-- KATHY KOREVEC: Or material, yeah. So the other thing we just shipped is code transformation, which is a model that is tuned for Python. So we will go through this whole rigmarole again to tune it for-- to personalize it. I really like that example of personalization that you brought up, Tor, because it's very-- they're very different. There are different degrees of personalization that you can do. And I would personally like to get down to individual based. And so you can give it some more information about what your preferences are as a developer, and then it will respond. TOR NORBYE: I will say, it is shockingly good at a bunch of languages, because again, we don't use Python, but it's really good at Kotlin. We always get a job-- ROMAIN GUY: --some of the examples you showed me. TOR NORBYE: Oh yeah, like, you know, so I demoed some stuff at I/O, taking code that is recursive and rewriting it to iterative. It's using nice, you know, Kotlin standard library methods to do it. Then I was shocked. I'm like, let me try this on XML, which is very common in Android. And it worked on that too. And I'm like, let me try this on Markdown. And it worked on that too. So I'm thinking that the model must have seen a lot of transformations in order to do this, because yes, it's not always perfect, and it seems to work best for smaller example. If you give it like-- KATHY KOREVEC: That is very true. TOR NORBYE: Yeah, so I know that Jamal in his keynote demo, like he divided the transformations into two halves, because I think he had experienced that it was too risky trying to do one big one. ROMAIN GUY: The problem is that the one example he picked was is the one that entered, it has had the quick intent for it since like, I don't know, 15 years. So it was the one example-- TOR NORBYE: Cleaning up code, yes, yeah. KATHY KOREVEC: I know. Yeah, there's always these examples where you're like, oh, I could have done that a lot better. And it's like, yes, you probably could have. You're a much more experienced developer than our AI is. [LAUGHING] ROMAIN GUY: Well, this one, it's automated by the IDE already. But yeah, so question about-- so the transformers, how does it work? Is it still just LLMs? Or is there more understanding of the code, and for instance, you go through an AC, or you know, do you do anything fancier then? KATHY KOREVEC: This is probably a good question for my colleague who's not here. I don't believe we're doing much more besides some prompt engineering with transformations that we have shown the LLM. But so basically, the way that it works right now, so we have it actually shipped in two places. One is Android Studio. And you can do specific things like, you can say, make this code more iterative or make this more idiomatic. And then you can also do things like fix errors. And so you can do some open-ended things. And then we also have it shipped on a lab property called labs.google/code, where you can actually give it a small snippet of code. You can give it a longer. You can give it more lines. But as Tor said, it starts to break down a little bit the more you give it. And we're working on that. It's a known limitation. But you can give it a snippet of code. And then you can say-- you can give it your own prompt. We do have some things like provide documentation, or clean this code up, or fix this error, and things like that, buttons that you can push and play around with it. But you can just say, hey, transform this JavaScript into TypeScript, for example, and it will actually perform that for you. TOR NORBYE: Yeah, so what we did in Studio with this, we're calling the AIDA endpoints, you know, so we're basically just remembering which editor region you were modifying. And then we just replace it. And it just turns out like-- we implemented transform just on top of AIDA's chat API last year. And then we had to work really hard because you get a code snippet back in the chat, and you have to figure out like, OK, how do we logically merge this into whatever file you're looking at? And now, with direct support, it seems, you know, I think a lot of that stuff is handled on your end where you're the one doing the fine tuning, making sure that for a lot of the common prompts, it does something reasonable. ROMAIN GUY: So do we give any more context than what you selected? For instance, like, is it aware of, you know, APIs I may have in my project that I'm calling from that code? KATHY KOREVEC: Not yet. ROMAIN GUY: Not yet, OK. KATHY KOREVEC: But that's actually really interesting. That's where I think products like this are going, and absolutely, where we want to take code transformation is. So there's two features that we're really exploring right now. One is, we're calling auto transform. And so this is kind of the transform is working with you in your file. So it works similar to code completion. So while you're coding, it can actually go through what is out of view for you and say, like, hey, there's an error here, or you just introduced an error with the code that you just wrote. I'm going to go and clean stuff up. And so that implies presence in your file too. So there's some product work that we have to do to actually display, hey, something is happening. The AI, the code transform AI is actually manipulating code out of you for you. Here's what's going on. How do we actually put that into view? So we're playing around with some of that stuff, yeah. ROMAIN GUY: It's funny, because I'm picturing a future where, like, every time I look at a piece of code, it has changed. KATHY KOREVEC: Yeah. So that's what we're trying to avoid. Yeah, that's what we're trying to avoid. We want to make sure that the developer, the person in the file, is the one that's in charge. And so if you don't like what's happening with the code transformation, you know exactly where it is working. You know what it's doing. And you can go and interrupt it. TOR NORBYE: I like what we did in Studio for this. We are always showing you the built-in [? diffure. ?] ROMAIN GUY: Really? You like what you've done in Studio? TOR NORBYE: No, but I think-- I think that with APIs currently-- and I mean, I think it's really tied to something inherent in technology, the hallucinations LLMs do, I think that you still don't want to blindly-- like, I would be afraid if we replaced the Kotlin compiler with an LLM that's going to just guess the bytecode. Like, I feel like you still need to inspect the results. And so I think that generally for now, something like transformations, you know, review them like you would do a code review. ROMAIN GUY: Yeah, I mean, what I found is that the simpler, or the more common is the task that asked the AI to do, the better it does. TOR NORBYE: Because it's seen a lot more training data. ROMAIN GUY: But then, you know, lately I've been on a bender doing super low level optimizations in our UA toolkit, you know, so like, you know, bitwise hacking crap. And every time I ask the AI to come up with similar solutions, at first glance, it's the same thing, except it's always slightly wrong. And if I hadn't done it myself just before, I'd be like, yeah, that looks right. So yeah. KATHY KOREVEC: Yeah. ROMAIN GUY: Yeah, you want still some control over it. KATHY KOREVEC: Yeah, you absolutely want control over it. I mean, this-- TOR NORBYE: And write tests. KATHY KOREVEC: Or yeah, well, we're exploring the AI writing test too. TOR NORBYE: Well, it's a bit like that. Someone said that, you know, now, people are generally like, you write down a bunch of bullet points, and you have the AI write your letter, like, the full body you send it to someone. And then they're like, well, I'm not going to read this. I'm going to have AI summarize it. So we're basically just going from bulleted list to each other, but we're just going through this dialect of AI summarizing and-- ROMAIN GUY: Writing the test is an area where I think it can be immensely helpful, because what the computer, whether it's an AI or something else, can be really good at is figuring out all the edge cases in my code that are hard for us to reason about. Very silly example, whenever you deal with floating points, you have to remember, oh, I have to test for not a number, positive infinity, negative infinity, minus zero, plus zero, you know, all those things together. And of course, like, there's an explosion of state. TOR NORBYE: Yeah, that's a feature we're launching at the Canary. now. It's using your chat behind the scenes, but basically test scenarios, right? We noticed that we didn't think that the AI was good enough to write the tests. We don't quite trust the tests yet. But having it generate all the-- you know, looking at the code, and figuring out what the scenarios are, and then creating all the stub methods for you, that is a feature that's going in the latest Canary. And it's kind of helpful for exactly that. KATHY KOREVEC: I mean, this is great. This is like almost-- this is getting into agentic workflows a little bit, where you can say like, hey, I want to-- you know, and agents, we can debate what is an agent and what is not an agent. ROMAIN GUY: What is an agent? KATHY KOREVEC: To the-- ROMAIN GUY: I don't know what that is. KATHY KOREVEC: End of time. So the way we approach it on the AIDA team is that an agent is something that can take a small prompt-- or a small prompt, turn it into a small task, and then write the code, and execute and verify that it has actually accomplished that, whatever you've asked it to do. So it's small things, like what Tor is just talking about with like, let's actually just write the scenario, and then not actually write the test. I'll write the scenario, and then hand it over to you to actually do the test writing. And one of the things, we're very interested in writing tests as well, having the LLM produce those. One of the things we commonly run into is that the test is always 100% right. And so that's a case where you're like, oh, I actually need you to be wrong every once in a while. So we're going through and making sure that it's writing tests that can actually produce real results, and not only unit tests, but also things like selenium tests, and stuff like that. So we're very interested in all of those kinds of things. And in that sort of agentic workflow, where it's maybe stringing several agents together to accomplish something more robust, or just saying, do one specific thing, execute that code, and verify back to me that it has been. ROMAIN GUY: It's interesting what you mentioned about wanting it to be wrong, because for UI testing, for on Android, for instance, we use this thing called the monkey, which basically generates random touch events and key events. And it's nice, because it finds issues that you would otherwise not find, but it's really, I mean, it's just random. So I kind of would like an agent that behaves like a human, but is wrong, right, where you tell it like, oh, please go through the order workflow in my app, but you know, it might change its mind. It might not finish the task because it's wrong. TOR NORBYE: We may or may not be-- ROMAIN GUY: I think that's super valuable. TOR NORBYE: We may or may not be investigating that. ROMAIN GUY: OK. So yeah, so I like this idea lot. And it's also close to the idea of fuzzing the code, just do things that you shouldn't do, but someone will. KATHY KOREVEC: Yeah, someone will. It's almost like, you know, we're playing around with like, OK, let's take a bunch of different random tests that humans have generated, and use that actually to train our test writing agent, and just kind of try to throw as many of those scenarios at it as we can, and then see what it comes back with. So we do a lot of this kind of stuff, so as we may or may not be exploring this world. TOR NORBYE: I want to get back into it. You mentioned earlier with transformations asking, hey, are you passing it more than just the code to be transformed? And that kind of gets into the question of context and retrieval augmentation, or RAG. ROMAIN GUY: Two million tokens. TOR NORBYE: No, well, that too. ROMAIN GUY: [INAUDIBLE] TOR NORBYE: Right. Yes, no, so one of the things that we've seen recently with AIDA's code completion is that you get a lot better results if you don't only feed at the current file, but you also feed it some other information. And so that's basically providing context, you know, of-- and I think our chat answers have gotten a lot better recently. And that has a lot to do, again, with in chat, you could be asking factual information, like, what time is it? What's the capital of Norway? And those things may or may not be in the training set for the LLM. And so you want to also ground it in truth. And so that's something that, you know, like search-based RAG is something that your team has been working on. Can you say anything more about that? KATHY KOREVEC: Yeah, well, I mean, you kind of just described it, which is going and getting additional context about what is happening behind the scenes with the code. So yeah, I want to know what time it is right now, or what the capital of Norway is. That may or may not be included in the training data. And so we want to go back and get more information about that. So my sort of white whale in this world is being able to write an agent or something that can do breaking changes version bumping for you. And the reason for that is breaking change version bumping on either an executable, or an API, or something. And I want to make it possible for people to-- for, especially in the open source world, for people to say like, hey, I want to write-- I want to release a new version of my framework, but it's going to have breaking changes. And a lot of times for maintainers, they won't go down that route because then people won't upgrade because of those breaking changes. I would love a tool to be able to say, I'm going to introduce breaking changes, and I'm going to introduce this tool which will fix your breaking changes for you. So you can go from version seven to version 14 immediately. And in order to do something like that, my hypothesis is that we have to take in a lot more information about versions seven through 14. Some of those may have been included in the training data, but a lot of them haven't been included in the training data because the model was trained a year ago. ROMAIN GUY: And the model may not even know what is the version, right? KATHY KOREVEC: Yes, exactly. ROMAIN GUY: They've seen the code, but it doesn't necessarily-- TOR NORBYE: Well, I will say, Android developers are very familiar with the problem you just talked about. KATHY KOREVEC: Yeah. TOR NORBYE: We have a lot of libraries that break. ROMAIN GUY: Yes, and I like what you said a lot, because the problem we run into, which is perfectly natural, is like, you have breaking changes, therefore, it's costly to move forward. So you wait, and then more breaking changes get introduced. And now, you're five years later, and the problem is even harder than it was five years ago. And so-- TOR NORBYE: It's usually worst right-- because if you're trying to go to the latest version, which was released yesterday, the LLM has not seen this before. KATHY KOREVEC: It hasn't, yeah. TOR NORBYE: So like, if I ask now, how do I migrate from JUnit four to JUnit five? Because you know, it's been a while, but it was just lots and lots of random changes for no good reason as far as I can tell. You know, I like to be inflammatory. But at least the LLM has seen that. So it might suggest what to do. But if it's like, hey, Compose 1.7 comes out with a lot of breakages, and it was released yesterday-- ROMAIN GUY: To be clear-- TOR NORBYE: Not that you would. This is a made up example. But the point is, like, this is when your search based retrieval augmentation is super important, because it can't just rely on its weights. It has to go and also go like, oh, what is this new thing? Let me go and see what that's all about to help. KATHY KOREVEC: Let me figure it out. So it can do things like go and review change log data and information from-- that developers are putting out. It can go review GitHub discussion data and things like that. So that's the kind of augmentation that we like to do to help us figure out, OK, what additional context do developers need from when the model was trained and introduced to the questions that they're asking today? And we release new models constantly. And the Android team is very good about picking those models up and integrating them. And so we're very good about staying on top. But you're always going to be lagging behind no matter what. ROMAIN GUY: So those models, like, let's say there's a new version of a popular framework, is there a way that you can patch an already trained model? Or do you have to retrain the whole model? KATHY KOREVEC: You can patch almost like hotfixes and things like that. You can do that. I don't know a lot of the technical details about what goes into that kind of thing. And we do a lot of that on our team and working across teams that train models at Google. But it's-- I think it's probably something that you don't want to do too often. And so you do want to release these new models. ROMAIN GUY: I'm wondering if we could get to a world where basically, you know, the model is more like Google Search itself, where it's constantly being refreshed. KATHY KOREVEC: I think it probably will. TOR NORBYE: I did ask-- I did ask Ben who's, you know, working on your team about making factual corrections to the model. And apparently, that's very, very difficult. There's research for that, but it often is more going to be like, we're going to do another training run, not a complete like, Gemini level training run, but like, fine tuning in order to like-- KATHY KOREVEC: Yeah, we can absolutely do more fine tuning. And then you're not training a new base model, which is more costly. And so that's where our team comes in a lot, is actually doing fine tuning on code specifically for these base models. And so we're just-- we're adding more context for use cases like Android to give it that personalization, or that customization. ROMAIN GUY: So you mentioned that at the beginning you said that your team, so the text Gemini, the model, and then you augment it or fine tune it. What does that entail exactly? Like, what can you share about that? Because I know nothing about, you know, yeah-- KATHY KOREVEC: I know very little about this too. I'm just the lonely product manager on this side. But that means taking the base model and all of what it's been trained on so far. And so we actually contribute a lot of training data also to the base model. But then for customers like Android Studio, or some of our other folks that we work with, they have a lot more data that's specific to their use cases. Or we have code that's specific to their use cases. And so we will actually take the base model and then almost like, retrain it, but in a smaller with that code data. TOR NORBYE: Yeah. So back to retrieval augmentation, you know, there are things that we can't rely on AIDA to do. In particular, users want to ask about their own code. So when they ask about stuff on the internet, well, that's handled by you. But we've seen people ask Studio like, hey, where in my code do I, X, Y, Z? And that has to happen in Android Studio. So, you know, earlier you were like, hey, you're just doing the UI. Like, well, no. ROMAIN GUY: I know you're not just-- TOR NORBYE: It turns out we have to work-- ROMAIN GUY: I know it's more complicated. TOR NORBYE: On local retrieval augmentation. And that's one of those partnerships, right, where the large context window helps. But even with two million tokens, a lot of apps are bigger than two million tokens. KATHY KOREVEC: Yeah. TOR NORBYE: Right? So there's that sort of understanding a query, and being clever about what we supply with the prompt to AIDA is kind of what we're working on now. KATHY KOREVEC: Yeah. And you're also-- the app is bigger than two million tokens, and your dependencies to get that up running, way bigger than two million tokens. So this is kind of-- this is the other thing that I really want to work on. And I'm sure other people in the industry are working on this part of it. So there's personalization. And then there's also-- how do I know about the supply chain? And if I'm going to upgrade my application from next.js 11 to next.js 14, there are likely a lot of dependencies that also need to be upgraded, or at least known about. And so then I can-- OK, so transformation can do that upgrade just on the next.js library. And I apologize, I'm bringing in frameworks that are not Android related. ROMAIN GUY: We've heard those names. KATHY KOREVEC: Yeah. TOR NORBYE: Yeah, it's OK. KATHY KOREVEC: I used to work at Vercel, so I have to-- yeah, I have to stay true to my roots, so. But it may or may not know about all of those dependencies. So back to my question about-- or our conversation about productivity, now, I'm left going and upgrading all of those dependencies as well, or checking to make sure that they work in that next.js 14 environment. So if the LLM can go and make those upgrades for me, or at least know about those dependencies and give me clues like, hey, you're going to have to check on this version of Babbel or whatever, then I can go and know where to look. Or it can say, I've bumped all of these versions for you too, because the app was breaking, and it can do other things. It can say, I went and performed this test and it failed with the Babbel upgrade, so you're going to have to go through and make that update as well. ROMAIN GUY: It's interesting the way you two are talking about this problem. It almost feels like what I want is an LLM that has been trained, or fine tuned just on my project and my dependencies so that it has all the context that it needs. TOR NORBYE: Well, like we see internally at Google, you know, the AI can be even more powerful because it's seen everything. It knows which exact logging library you're using. ROMAIN GUY: By everything, you mean all the Google code, right? TOR NORBYE: There's nothing that is a external dependency. So like, if you are a user of Android Studio, and you work at a large company, and you have some internal logging library, let's say, our AIDA tuned code completion isn't going to know anything about it. So you're going to get some help based on our local retrieval log. We're going to send some stuff, but maybe not all. Whereas, like in Google, because the internal model used for code completion has been trained on all the CL, all the code, it has a very complete view of what is expected. And that's kind of tricky. And I think that's maybe, you know, next generation thing is for us to, in addition to retrieval augmentation, you know, offer some sort of customization, where it can really kind of know. I think if you're using a lot of the standard Android libraries, I think our product is really good already. If you're really using Firebase and Android X, and we also include a bunch of important open source libraries like OKIO, and so on, right, Coil. ROMAIN GUY: So you were mentioning those popular frameworks, like next.js. I imagine it's the same thing for any platform where if you're using some of the most popular libraries, you're going to have an easier time. TOR NORBYE: Yeah, what are the types of public repositories you're training on? KATHY KOREVEC: I probably can't enumerate the number and what all of the popular frameworks are that we're training on. But if you look on GitHub, we have a bunch of next.js data in there, I'm sure. So it's all permissively licensed GitHub data. ROMAIN GUY: Saves me a lot of time. I don't have to read all those GitHub repositories myself. KATHY KOREVEC: The personalization thing goes far, because if we can get to a place where we're able to provide-- like, I can say, hey, act like Kathy coding when you give me answers in code completion, or code chat, or even code transformation, act like me and do things that are specific to my project, then it's going to do things like, the payload I actually get back from the LLM might be a lot smaller than a generic payload. Because it knows I want to write this in Tailwind, or I want to write this in material, and not give you just Vanilla CSS back. And then we can actually optimize around the performance as well. Or your preference might be that I want a gigantic CSS file. And that's totally fine. But for me, I want something that is very, very performant, so a lot smaller. ROMAIN GUY: The setting I want is the one that disables this behavior that I've seen a lot of LLMs, where it feels like reminds me of when I was writing, you know, doing essays for school, and I hadn't really done my homework where you're trying to produce a lot of words to send out. KATHY KOREVEC: [LAUGHING] Yeah, the problem with LLMs, you can tell them like, hey, write, you know, another 500 words, but it'll just regurgitate the same thing that it just wrote. I've tried this with my boss. I'm like, OK, I'm going to write my-- I don't know, my performance review. I'm going to use Gemini to do it. And she totally knows. [LAUGHING] TOR NORBYE: Yeah, then you start messing with Trump, like, and sound like a fifth grader. KATHY KOREVEC: Yeah, yeah. You know, you start playing with the, you know-- in fact, that's something we're looking at now. One of the features we showed at I/O was like, generic commit message. You know, we send the diffs to the model, we get a commit message back. First time I tried it, I was shocked how good it was. Like, how is it doing that? I thought it would just describe what it saw the diff do. But no, it was going one level beyond, which I was just shocked by. But anyway, right now, it's kind of a one-shot thing. Here's the commit message. And we want to give you the ability to say like, I want you to be more verbose or less verbose, so basically be able to clue it what you want, because right now, you just keep pressing the button until you see one you want. And I was like, oh, the one I saw three ago was the best one. I should have held on to that one. KATHY KOREVEC: Yeah, and then you want to go back, yeah, yeah. ROMAIN GUY: I've been using Gemini a lot lately to read assembly code for me, because it's a pain to go to the documentation for every instruction. And I remember there was this particular piece of code where it started nicely describing every instructions like, great, great, great. And then it's like, from line 13 to 21, it does math. Like, thanks. [LAUGHING] That's-- yes, I could have figured that out myself. KATHY KOREVEC: That's helpful, yeah. TOR NORBYE: I was imagining that you had it basically like read it out loud for you, you know, just entertainment, just like the text. ROMAIN GUY: Yeah, no, I like low level stuff, but not that much. TOR NORBYE: Yeah. KATHY KOREVEC: Maybe if you're trying to go to sleep. ROMAIN GUY: Yeah, the dramatic reading of my assembly code. TOR NORBYE: Is your team-- I think your team is involved with Gemma as well, right? You wan to talk a bit about what that is? KATHY KOREVEC: Yeah, so Gemma is an open weights model from the Google DeepMind team. ROMAIN GUY: By open weights, you mean it's open source weights? TOR NORBYE: You can download it. KATHY KOREVEC: You can download it, but it's not open source. And we trained the code variant of that. So you can think of it as Llama and Llama code. And so my team, very similarly to how we work with Google DeepMind is, we have a strong partnership with them, and we will contribute directly to the base model. And then we'll fine tune that model for code. And so that's exactly what we did for Gemma code. And now, we have two different versions of that out in the wild, and people are using it and loving it. And it's fun to see what they're doing with it. ROMAIN GUY: So if I wanted to build my own agent locally, completely offline, I can download those weights and-- KATHY KOREVEC: Yep. TOR NORBYE: I mean, you could even train it for personalization. But I think the main use case, I think for Android developers here is, for, you know, if you have a company policy that says we are not allowed to let our code leave our premises-- we have a few users like that. That's why we kind of doubled down in Studio Bot 1.0 with like, you know, where you could choose not to share your code, and the chat is just like opening a browser tab kind of thing. And we have the AI Exclude mechanism now to let you filter exactly what can be seen. So Gemma lets you run your code completion, all these things locally. We don't actually integrate it with Studio yet. We probably-- that's one of the things we're looking at. But that's, you know, for the privacy angle, I think that's really where this is a potential solution. It's not as good as the online models, which are bigger and faster, but it's-- ROMAIN GUY: Actually, I like that a lot, because so I've been working on-- I was talking about assembly code because I've been working on this tool that lets me type Kotlin. And it will show me the assembly code that will eventually get compiled on the device. And it's good for performance investigations. But to the point I was making earlier, understanding the assembly can be a pain. So I was like, hey, maybe I could look up the documentation, or maybe I could hook it up to an LLM. TOR NORBYE: You can use the API she mentioned. ROMAIN GUY: But the problem is that with the APIs, then suddenly, you know, you have to deal with API keys and stuff like that. I don't want to deal with any of that. So if there's an open weights model, I can just integrate in the app, that sounds awesome. I should look into that. KATHY KOREVEC: Yes, yeah. ROMAIN GUY: I'll ask you questions. KATHY KOREVEC: OK. TOR NORBYE: I think that dealing with the-- ROMAIN GUY: What do I do? TOR NORBYE: Dealing with the API is probably a lot easier, I would think, than actually figuring out how you're going to run Gemma as part of your app. KATHY KOREVEC: Yeah, maybe. I would recommend that. Before I joined Google, I built an application that was running an image processing LLM on a computer that was in my basement in my garage. And it was like-- it was very difficult. I don't necessarily recommend doing that, but-- ROMAIN GUY: Let's put it this way, it sounds more interesting than calling an API. KATHY KOREVEC: If you're nerdy like I am, then I would recommend doing it. ROMAIN GUY: I'm talking about a tool that lets me look at the assembly code generated. KATHY KOREVEC: Yeah, yeah, there you go. There you go. So I recommend it. Yes. TOR NORBYE: I will just say, for your first integration, try in the latest versions of Studio, we have a Gemini API starter template. And so it basically-- ROMAIN GUY: He's always telling you stuff. TOR NORBYE: Oh, yeah. But it just-- basically, you just go to AI Studio, which is not Android Studio. AI Studio has a button, like, give me an API key. So you just get a string. And then in the Studio template, it just shows you like, hey, it's a sample project where put your key here. Here's-- ROMAIN GUY: I know how to API keys, but then anybody using the tool needs also their API key, because I'm not going to pay for that. TOR NORBYE: No, but like, it's three lines of instructions like, go to this link, get your string, put it here. ROMAIN GUY: Yeah. TOR NORBYE: Yeah, no, no. If you want to integrate Gemma even better. KATHY KOREVEC: Yeah, you can use Gemma code for free, and you can run it yourself, and you can-- what's cool about it is, you can use it like, for privacy reasons. You can use it to train on whatever-- ROMAIN GUY: Well, it's privacy reason. But I there's also a user experience angle where there's a number of tools that I've tried that had an integration with an LLM, but then I ran into that where suddenly, like a dialog shows up and they're like, please insert your API key. Like, I just didn't want to deal with it. KATHY KOREVEC: It's just a pain. ROMAIN GUY: I know how to do it. It's like, I don't want to do it. TOR NORBYE: What I will say is that, it is extremely CPU intensive in memory. KATHY KOREVEC: It is, yeah. TOR NORBYE: So you'll probably be surprised by how much RAM you have to give it and how long it's going to take. ROMAIN GUY: I have an idea. TOR NORBYE: And that's when you realize like, oh, I guess you know, Google servers, there must be a lot of hardware behind this, because, wow, that completion didn't-- ROMAIN GUY: Don't get me started on the memory usage because I spent my whole career getting yelled at by various people for the memory that, you know, the graphics stack is using where we're like, yes, we use four bytes per pixel. KATHY KOREVEC: Yeah. ROMAIN GUY: And then LLMs show up, and they're like, oh, you want RAM? Here you go. TOR NORBYE: Yeah. KATHY KOREVEC: Well, the cool thing about Gemma code is that we-- and I think I might be mispronouncing it-- Code Gemma, at Google, surprise, we've renamed it 50 times. The really cool thing about it that we've learned, we have various different sizes of it. One of the most popular sizes is the 2B version of it, and-- ROMAIN GUY: So the small one? KATHY KOREVEC: The small one, yeah. And so that signals to me that, you know, it's maybe not something that a lot of people are using for professional reasons, like, really hardcore reasons to build applications. But it's actually a lot of hobbyists. And so a lot of people who are like, oh, I just want to explore this assembly code. And I want to do it on my own machine. ROMAIN GUY: Well, I imagine just learn how this whole-- KATHY KOREVEC: Yeah, exactly. ROMAIN GUY: --thing works. KATHY KOREVEC: Yeah, it's a lot of tinkerers. And so that to me, is kind of-- especially at this moment right now with AI, there are a lot of people-- we talked about developers being some of the early adopters. There are a lot of just like, OK, what is this thing? I want to use it for my hobby project. I just want to learn about how to train something myself. And I can now go do it with this very small open weights model. I love that use case. TOR NORBYE: What did we not ask about? KATHY KOREVEC: The cool things that we're working on right now that I can allude to are in you know, like, I just had this conversation this morning with a couple of folks on the team about whether we should go down the no code, low code route. And for me, I think this is-- I've been working in developer tools for a really long time. I was at Heroku actually way back when they were building Heroku Garden, which was an online IDE, and very much going into the no code, low code space. And so this has come up in my career again and again and again, whether or not low code, no code tools should exist. ROMAIN GUY: We've had discussions about that before. KATHY KOREVEC: So I think that there's a place for them, for sure, and there are use cases. My interest is very much like, I believe that coders will be-- people who manipulate code and work with code, they'll be around forever, and you know, regardless of how big and how ubiquitous AI gets in the space. I think that they will turn to no code, low code for specific use cases, and then they'll want to inspect the code afterwards. And they'll turn to coding tools, and code AI tools for specific use cases. TOR NORBYE: It might be a bit like the cross-platform toolkits too, right, that are used for a certain set of apps. But when you want to get to the top of the place, the rankings, well, then you're going to go-- KATHY KOREVEC: Yeah. ROMAIN GUY: I mean, in some sense, it's also something that we do all the time. You mentioned Studio templates, or as soon as we have the ability to not write code we don't have to write, we'll take that chance, right? KATHY KOREVEC: Yeah, totally. TOR NORBYE: I definitely prefer to use Google Docs over like, a wiki. You know, that's just me. KATHY KOREVEC: But I think that, you know, one of the things we're exploring is, I think traditionally, low code, no code tools have been like, give it a-- your input should be a sketch, like, a napkin sketch, or a wireframe, or something like that. And then I want to-- there's the make it real button from TL draw, which I love that tool. And it takes your drawing and turns it into code. And what we're exploring is maybe that's not the entry point. Maybe the entry point is this conversation where we're like, OK, here's an app that I have an idea about, and it does X, Y, Z, and I want to be able to produce that. And maybe the LLM is listening to this conversation saying like, oh, here is a wireframe for that, and here's the code for that, et cetera, et cetera. So maybe it's natural language to code to design. So I think there could be many different inputs. ROMAIN GUY: There's an interesting discussion to be had here, because every time I think about, hey, I'm just going to describe what I want in English, one of the issues with English, or any spoken language is that they're not as precise, of course, as, you know, the mathematical language, or programming languages. And we know that a really hard thing to do is to write specs in English. And then you're like, OK, for my app to work, I have to spell out everything in English. And then I'm like, well, it wouldn't be nice to have a language that's more precise for that crap programming language? TOR NORBYE: Well, that's-- like we demoed at I/O sort of, you know, the early collaboration we're having on multimodal stuff, where we put a screenshot or a mock of an app and it spits out the compose code. One of the things that would be nice that we don't have yet is the ability to then in English correct it and go like, looks good, but you know what? I want that button over on the left, right? And so-- ROMAIN GUY: But it's not even that to me. It's like, you have the mock, and you need to be able to say, OK, but here's what I want to happen when the list is empty, when I have no network, when the user is not logged in, right? Like, it's all those extra-- KATHY KOREVEC: These things you don't think about. ROMAIN GUY: And so that's where as an accelerator, it's fantastic. But to do an app completely that way, like-- TOR NORBYE: And as someone who's paid to code, I'm glad that there still is a task there and it's not all fully automated. KATHY KOREVEC: I think they'll be-- TOR NORBYE: I'll just say that. ROMAIN GUY: I like the vision. KATHY KOREVEC: Yeah, yeah. I think there will be a task there forever. But you know, I don't know if you guys-- I mentioned that I used to run product at this company called Vercel, the company that made next.js. They just launched this thing in the last couple of months. It's called v0, v0.dev. And it does this. It takes natural language prompts and it turns it into design, and then you can look at the code. And then you can say like, oh, I want that button to be blue, or you can highlight a specific part of the UI and say like, I want this button to be over here or something. And then it'll also fill in-- it'll know oh, here's a select box and it'll fill in all of the data for that for you. I think it's really-- this space is-- and that tool is very new. And there's obviously a lot of room for it to grow. And I think it's really interesting to take that perspective of, oh, you actually don't know exactly what you want. I'm going to not only make I, the AI, am going to not only make coding recommendations, but I'm also going to make design recommendations. And then I'm going to make data recommendations. And then I'm going to make performance recommendations. And these are the things that don't quite exist yet. So you get outputs that are not ready for production, but they're good starting places. ROMAIN GUY: But what you are describing like, you know, filling a dropdown, moving a button to the right, that's not the interesting part of building a product or the-- so yes, if something could do it for me-- KATHY KOREVEC: Yeah. ROMAIN GUY: Sounds great. KATHY KOREVEC: Yeah, yeah. TOR NORBYE: Interesting part is the assembly code, right, Romain? ROMAIN GUY: Well, to me, yes. KATHY KOREVEC: Yeah. TOR NORBYE: Yeah. No, there's a lot of stuff coming that's pretty exciting. KATHY KOREVEC: Yeah. I'm trying to rack my brain about what else I can talk about that is coming that is really cool, but. ROMAIN GUY: Well, I'm sure we will hear about it. It's hard to-- it's hard not to. KATHY KOREVEC: Yeah, yeah, for sure. TOR NORBYE: Cool. All right, well, thanks for your time. ROMAIN GUY: Yeah, thanks for coming. KATHY KOREVEC: Yeah, thanks for inviting me. This was fun.
Info
Channel: Android Developers
Views: 4,197
Rating: undefined out of 5
Keywords: Android
Id: tprU6FTZrHc
Channel Id: undefined
Length: 52min 19sec (3139 seconds)
Published: Thu Jun 20 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.