Effective ProGuard keep rules for smaller applications (Google I/O '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] STEPHAN HERHUT: Hello, everyone. Thanks for coming. I'm Stephan. I'm a software engineer at Google, where I work on compilers and runtime systems. And most recently, I've been working on R8, which is Google's new shrinker for Android applications. Now, if you attended yesterday's session on new compilers and Android Studio, you already heard a fair bit about what R8 is and how you can use it. And today, I don't want to talk too much, actually, what R8 is. I want to focus more on keep rules. So what are keep rules? Keep rules is the configuration language that ProGuard uses to specify the things you need to keep in your application while shrinking. And when we built R8, we decided we wanted to use the same language because we wanted to be a drop-in replacement for ProGuard, the idea being that you could just reuse your existing rules and try R8 really easily. Now, as you might imagine, building something that is compatible to that degree required us to quite deeply understand what ProGuard keep rules actually mean. And today, I want to use this opportunity to share some of that knowledge with you. And I also want to take you on a little journey to understand what it really takes, given an application, to come up with effective keep rules to shrink it to its minimum. But before we go there, let me ask you all the questions. So who of you has used ProGuard before in some way or shape? Wow. That's a lot of people. So who of you is using ProGuard today in the Released Production app on the Play Store? That's already a little less, but you're a great crowd. Thank you for doing that, because our estimates show that only about a quarter of applications on the Play Store actually use keep rules. So there's a big room for improvement. And I want to talk a little bit about why it actually matters to shrink application size. So, of course, for me, it matters, because I helped build a shrinker. Right? I'm interested that somebody uses it. But it should also matter for you. There has been a lot of talk about next billion users, entry-level Android devices, and how we can make the user experience better. And what makes an entry-level user Android device? So one thing is, it's really resource-limited. So if you think of these devices, they typically have less than 512 megabytes of RAM, or they might only have like four gigabytes of storage. And quite often, people that use these devices are in areas where they have very limited connectivity. So for these users, size actually makes a daily difference. Right? When they decide of all the apps they actually want to use which ones they can afford to install, there might be a small subset. And that's bad for them because it's a bad user experience, but it's also really bad for you as application developers, because it might be your app that doesn't make the cut. It might be your app that they actually want to use, but they can't use, because they don't have the space. Now, you might say, OK, that's not really my audience. Next billion users, entry level, that's not where I see my application. But even if you target really high-end devices, the fundamental truth is that smaller is always faster. And that's quite obvious, because, A, they download faster, because they are smaller. There's less to transfer. They also install a lot faster, because there's less code to compile, and that what takes the time at installing. And lastly, and that's what users see every day, they also start up much faster because there's less code to load. So whenever I talk to people about this and say, OK, you have to care, you have to make them smaller, an answer I get really, really often is, yeah, but hardware will fix this. Right? Devices are becoming faster. There's more storage. Connectivity is better. That might be true, but it's not a solution. I have a graph for you. So this shows the average size of installed apps on people's devices. And as you can see, this has been growing steadily. Since the early days of Android, we started at about a megabit per APK. And by now, we are at a whopping 32 megabytes, on average, for installed applications. So clearly, hardware is not going to fix this. We have to do something about it. We have to make apps smaller. So let's look at this. Where does this growth come from? Why are apps so big? And to get an idea, I thought I'd bring you a little example. So I built this app, which I call Simple Weather, and the stress really is on simple. Because all this app does, it take some statically pre-defined weather data and then renders it as a dynamic graph. Dynamic graph means if you turn the app, it will render in a different size, or if you install it on a different device, it will adapt to resolution. So the graph is truly dynamic. The data isn't. And this is a really, really simple app. And there's are two things that I found surprising when I built this app. The positive thing was, it took me all but 30 minutes. I just went on the internet, I looked for a graph library, I typed in about 100 lines of code, and there it was, my little Simple Weather app. But there was also a negative surprise. And that was the size. I thought I built something simple and small because that makes it small, but that's not true. This app was two megabytes of an APK. Two megabytes just for rendering a graph. And even worse, when I installed this, it turns into a four-megabyte blob of code because it was uncompressed. And again, you might say, OK, four megabytes. That's not really much. Right? New devices. Lots of space. Let me put that into perspective for you. If you look back into the '60s, we flew to the moon with 60k of code. Right? That's 1/60 of the application size I had for my Simple Weather here. And no matter how you turn this, it's strictly more complicated to fly to the moon than rendering a graph. So what has happened here? Why did we go from 60k flying to the moon to four megabytes rendering a graph? And I think the reason is that we fundamentally changed how we do software. So back then, when they did the software for Apollo, there was a dedicated team that wrote this code line by line, everything was on purpose, everything was handcrafted, so it was really meticulously crafted to fit into this 60k, and do exactly one thing, fly to the moon. Fast-forward to today. How build I my app? I use components. I just went to the web, downloaded some components, stitched it all together. Ta-da! I have my app. There's a great advantage of this. Right? It took me only 30 minutes. It was really easy. I was super productive. But there's also a big, big drawback. My application was really big. So let's look into the details. Which components did I actually use? For that, I brought you my build.gradle file. And this is essentially the default file that Android Studio will generate for you. And all I've done, I've added two components. I've highlighted them here, so that it's easier for you to see. The first one is Guava. Now, Guava, that's Google's common components for software engineering and Java, and they add a lot of convenient classes. And the thing I wanted was immutable collections. Because I thought my static weather data, it's static, so it should be an immutable collection. Right? That's good software engineering practice. The other thing I use is Android plot. And that's just a library I found in the internet. There's probably lots more of plotting libraries, but I spotted this one the first. And AndroidPlot is really great. It has lots of support for bar graphs and all kinds of charts. But I really needed a line graph. That's all I cared about. So let's look what this means for size. How did these components impact the size of my application? And to get an idea, I went to the APK Analyzer. Now if you haven't seen this tool before, it's really great. It's part of Android Studio. You will find documentation on developers Android.com. But what this does is it gives you deep insight into what contributes to the size of the APK. It can do resources and all kinds of things, but I'm only interested here in the actual code. So I've highlighted that at the bottom there for you. There is this com Google package which contributes 1.4 megabytes. Well, that's Guava. So I'm paying 1.4 megabytes for a mutable collection. This might be a very extreme example, but there's something that is more realistic there as well, which is AndroidPlot. That's the second thing you see. And that's 180K. So my plotting library takes 180K of APK size. Now there will be something in there that I don't need, because my actual application that's the EU part down there is only 35K. That's the code I wrote. You might think 35K for a hundred lines of code? That's a bit much. Well, it is, because there's all the auto-generated code, most of which I actually don't need. So this brings us to the question how do I get from this, where you can clearly see that Guava and Android plot take the majority of my APK size, [INAUDIBLE] application is really small, to something that's more tailored, that's more like the thing that they did for Apollo. Of course, we won't get to something as crafted as that without doing all the investment, but there must be some kind of middle ground. And that really is where a tool like ProGuard and R8 comes in. Because one of the things it does, it takes your application, and it removes all the unused components. So the goal is to take a componentalized build that you've made into something as tailored as possible, and to remove all that code. Now when I ask you, you all said you already used this. So you will know it's really easy to enable, because all you have to do is you have to head back to your build.gradle file, and then flip this one flag. Right? If Android studio generated this for you, then it will already say minfyEnabled, false. You just flip that to true, and ta-da, you have a small app. I see some people are shaking their heads already here, because, of course, that's not really the truth. So I did this for my application. I flipped the flag, and there it was, more than 200 build time errors. Hooray. So I looked at these errors, and I thought, OK, what are they trying to tell me? There's classes missing. Classes I haven't even heard of. What do I do? How do I fix this? Well, the first solution, you search on the internet. So I did. And I found this great piece of advice, which says, just put dontwarn star into your ProGuard configuration, and everything will be fine. Now, technically, this is correct. You put this in there, and it will compile. But there's a problem. What this tells R8 is, no matter what happens, don't tell me about it. And that means it will mask these benign errors, but it will also really mask all the errors you actually care about, where something went wrong. So how can we improve on this? And to do that, we have to actually deep dive a bit into how R8 works. Now I will talk about R8 here, because that's the tool I helped build. But most of this also applies to ProGuard, because it solves the same problem. So what does R8 do? It does two things. Firstly, it does minification. And minification is the process, where you take very long class names and replace them by very short class names instead. Some also call this obfuscation, but it really doesn't obfuscate your code. It just makes it a little less hard to read, but it's nowhere safe for reverse engineering. So let's call this minification. That's one thing ProGuard does, but I don't want to focus too much on that. The other thing it does is shrinking. Now shrinking is a great name for this, from a developer perspective, because it takes your app and it shrinks it into a smaller app. If you actually want to understand what happens under the hood, it works actually the other way around. Because what we're doing is we're doing tree growing, which takes the entry point of the application and grows that until we've seen everything that will be executed at runtime. Let's look at the example here. So I create these graphics. And in essence, a box is a class. That's all you have to care about for now. There's also code in these boxes, but don't read it just yet. Another thing on these graphs is that everything on your right is library classes. That's part of the Android system. And everything on the left is your application code. And the first thing to realize here is that library classes are always live. And there's a very practical reason for this, because we can shrink them away anyway. They're on the phone. They're part of the system. But there's also a technical reason, because for a static analysis tool, we don't know what these library classes actually do. And that is because the runtime might call into them at any point. There's very many different libraries, like different Android versions, and they also might change in the future. So from an analysis standpoint, we just have to assume that the library class is always live. So let's assume we will start our app by calling the run method in the app class. So the first thing we will have to do to actually do this is we have to instantiate this app class, which means we create an instance of the class app, and we also call the constructor. And that leads to both of them being live. That means we cannot strip away the app class, and we cannot strip away the constructor. So far, so good. Next, we have to actually look at the code of the constructor to see what it will do at runtime. And if you look at the code, you will see it actually creates a new instance of Class A. So again, this makes class A become alive. We can no longer remove it. Note that class doesn't actually have a constructor. So there is no code to look at. It only has a default constructor that does nothing interesting here. The other thing that the constructor of the app class does, it writes the created instance to this field, Other Field. It's interesting to note here that doesn't actually make Other Field live. Because writing to a field is not observable. You have to actually read the field to know that the field exists. So now we've created the instance of this app class. We've executed the constructor. We want to call the run method. And again, calling this run method will mark this method as live. And like with every other live method, we now have to look at it and see what the code actually does at runtime. So if you look at Run, you will see it first reads the field Other Field to retrieve the instance, and this is the moment where this field actually becomes live. And next, we will call A method on the retrieved instance, and that is where A method in class A becomes live. And now we would look at the code of A method, see what that does, and mark all those effects. So this is how the basic analysis flow works. Whenever you think about keep rules, you have to keep in mind, this is what your analysis engine will do. Now, how does this relate to Android? How do we actually know the entry point of an Android application? Well, that comes from your manifest file. So this is a very simplified manifest file. But one thing it does, it tells you that the activity has the class, the graph, as its implementation. Now, neither R8 nor ProGuard do understand manifest files. And they shouldn't, because there is a little tool that actually helps us understand them, and that's called AAPT. AAPT during a build process pre-process all your resources, and it creates corespondent keep rules for you. So let's look at this keep rule. And this is the first keep rule. So we'll talk a bit about it in detail. So what it basically does, it says keep. That's the simplest form. It says, OK, everything I mention now, you have to keep. Don't touch it. Don't shrink it. Don't rename it. And you want to keep a class, and then we have a fully qualified class name, which is my main class that came from this manifest. There's something more here. Just the class name would only keep the class. It wouldn't actually tell the system that this class is also instantiated, which is a big difference. To tell it about that, we also have to keep the constructors, and that's what this init line does. So this init, with the elided parameters, tells R8, we also need the constructors, and we will instantiate this at runtime. So now, R8 knows this class is live. And as I said before, it will look at the code. Let's go there. This is my TheGraph class, and I've removed all the function bodies and everything that's not important. But what you will see here is it actually doesn't have a constructor. So that means our analysis ends right here. We've seen the class. There's no constructor. Nothing to do. Now, every one of you who's ever written an Android app knows the actual meat happens in the onCreate method. That's where the actual configuration happens. So how does R8 know? The keep rule never told it to actually look at onCreate. Well, the trick here is that the graph extends this AppCompatActivity class. And if you keep on following this, you will see that eventually that extends activity. Now activity is a Library class. And as I said before, all library classes are always live. Hence, this onCreate method is also life. But this is onCreate in the library class is live, all its overrides in live subclasses also become live. And that's another thing of the analysis you have to keep in mind if you want to understand how it actually works. So this is what marks onCreate life. Let's take a look. This is my onCreate method. And it looks like a standard onCreate method. It first does some set up, delegates to the superclass. I want to highlight only two things here. The first is this findViewByID. This calls the Library method, and what it does, it at runtime dynamically returns an object somewhere from your view based on your layout. Again, R8 cannot really understand this. Because this ID is, again, defined in an XML file somewhere. This is my layout. And you can see it says this XYPlot class is in my constraint layout, and it has this ID plot. How do we figure this out in R8? Again, AAPT comes to the rescue. And this keep rule might look quite similar, but what it tells R8 that your layout uses this XYPlot class, and it will instantiate it at runtime. So it's always the same principle. Another thing I want to highlight here is that during this addPlotSeries, where we set up the plots, we use another R identifier, which is this R.xml. Now, R.xml is different. These are three XML [INAUDIBLE] just puts in your XML directory in your resources. There's no requirements on their contents, so AAPT cannot actually understand them. So when you use these R.xml somewhere, you are responsible for all the keep rules that may require. Just keep this in mind. We'll come back to that later. So this is the basic analysis flow. This is how this basically works. Now you might ask, OK, 200 error messages. How does that relate? What went wrong there? I mean, this analysis looks reasonable. Why does it fail? And the reasons is that the analysis that we do is different from what the VM does. And one difference is annotations. Now the Android VM doesn't really care about annotations at all. They have no meaning at runtime unless you use reflection. So if an annotation class is missing, the Android VM will still just execute that code because it doesn't even look at them. In comparison, R8 has to understand notation classes because they might be part of a keep rule. So R8 has to find these classes and has to understand the hierarchy. And if R8 can't, it will warn you about it. So that's a very common source about these warnings, is missing annotation classes. The other thing is code R8 just can't understand. Here's an example, which is class value. So class value is a concept from Java 7, but that's not available on the Android platform. So this code will actually fail at runtime. When the entered VM tries to execute it, it will tell you this clause is missing. Why does this still work? Because the creators of Guava used this nifty trick here to hide the missing class. What they do is they load the class via Reflection, and if that fails, they fall back to some alternative implementation. Now, the Android VM will understand this at runtime. It will just throw an exception, the exception gets caught, and the alternative is executed. But R8 cannot understand this code. It's just too complicated to handle aesthetically. And to fix these errors, you just really have to look at all these examples and find where they came from. And ultimately, that distills to these five keep rules or warning rules that you have to add. Now the first three, they just disable warnings about certain annotations from the checker framework and error prone, and those are just static analysis tool frames that are not used at runtime at all. The bottom two are two classes that are not readily available on the Android platform. And they are typically used via some reflective wrappers to make this work at runtime. So adding these five rules, we get our application to compile. That wasn't bad, right? So we looked a bit at it. We added five rules. And ta-da, we have a smaller application. Unfortunately, the runtime behavior of my application has changed just ever so slightly, because now it crashes. And that's the other problem you typically see. Again, what happened here? Right? I explained to you how the analysis works. That looked all fine and reasonable. The typical problem is reflection. Because by its mere nature, reflection is about using a dynamic runtime value to load a method or a clause. And static analysis just can't understand this. Dynamic values are the enemy of static analysis. So how do we fix this? Well, we have to somehow figure out how to tell our aide about these cases of reflection and make R8 understand them. And that's really what keep rules do. So what did I do? I went to the internet. Unfortunately, the developers of AndroidPlot put up this rule on the internet, which says, keep class-- OK, you want to keep some classes-- com AndroidPlot, star, star. And what this means is it tells R8 to essentially not touch Android plot at all. Again, this will fix the problem. It will run again, but it will no longer shrink. So this cannot be the point in using R8. So how can we improve on this? And there's really no clever way doing this, other than going on some forensics investigation. We really have to find out where all this reflection is happening and what we have to add toward the configuration to make R8 understand it. So where do you look? Where do we find evidence? And the first place to look is adb log. So I've preprocessed this a bit here so that it's easier to read. But in essence, if you go on Android Studio and use the log viewer, you can filter by Process ID, and this is, in essence, what you will see. So there will be this log statement saying that styleable definition not found for A. As such, this is not really helpful, because I don't understand what this is trying to tell me. But it is really great, because it gives me a place to look in the source code. So this is a big piece of advice. If you write these kind of libraries and you do a reflection, put in logging statements. It's not so important that people actually understand the message. It's much more important that people will find where this statement was generated. Because I have this logging statement there, I can now actually look at the code and see what it's doing. So here's plus plotjava. And as you can see down there, it says log.d styleable definition not found. So what is going wrong here? As you can see, this does reflection. There is this styeableclass.getfield, and styleable class is a class object. And get field will get a field from that class, give them the name, styleable name. So this is an example of reflection on a class. How is styleable name defined? Because that's clearly what is going wrong. It's trying to find a field called A. That seems a strange field name. This is how styleable name is. I's defined by means of getclass.getname. So again, we have to understand what this does. What does getclass do? That's, again, a reflective invocation. It will return what's called in Java the current class. Now if you're in this plot file, and in the plot class, the current class can be the plot class itself, but it can also be any of its subclasses. Because at runtime, this method might run in different contexts, depending on how the virtual dispatch worked. And now we get the name of this class, and this seems to return A. Why did they name their class A? Well, they didn't. The problem is minification. Right? R8 went ahead, saw this class, and thought, well, plot is a long name. Let's call it A. So to fix this, we have to prevent R8 from renaming this class or any of its subclasses. And this is the corresponding keep rule. So again, what does this say? It says keep class com androidplot.pot. And that means R8 should keep the class, not rename it, not optimize it. But as I said, we also have to keep all the subclasses, so we have to say keep class star extends com androidplot.plot. This will keep all subclasses of plot and the plot class itself. But is this actually what we want? If you think again about what get class does, it, at runtime, returns the class of the current object. Now, you can only be the current class if you have actually been instantiated. Right? If a class never gets created, there is no way of getting it via get class. So our keep rules do not actually have to keep extra classes. All we want is we want to prevent them from being renamed. And that's where modifiers come in. So here's a modifier for you. What this now does, it still says keep. But it says Allow Shrinking. So it tells R8, if you see this plot class, you're allowed to remove it if nobody uses it. But if you keep it, don't rename it, and don't optimize it. And this will fix our problem, because now the plot class at runtime will actually still be called plot, as will be all the subclasses. So that was not too bad, right? You look a bit at the code. You come up with two keep rules, and your application will run. Or not. So it's still not working. We have to do more forensics work. What do we do? We go back to the adb log. And the message has changed. We now see a different exception. Again, it's not really clear what this is trying to tell me, but I have an exception I can look for. So this says, error while parsing key, linePaint.strokeWidth. OK. Why does this happen? How do we find out? We look at the code. And here's the corresponding method. And I've highlighted the reflective use in there for you. So again, you can see, we take a class, and they beget all its methods. And next, we compare the name of this method against some given name we are looking for. So there's two things that can go wrong here. We are just getting a set of all methods so we might have removed too many methods. And again, we're getting the name of these methods, so we might have renamed them. So these are the two error conditions we now have to check. What are we removing, and what are we renaming that we shouldn't? Now I have to admit, it's kind of hard to figure out what this code really does unless you are the library developer. So the person who wrote this code initially knows perfectly clearly what this is doing. And at that point, it would have been really easy to write these keep rules. So what does this do? Do you remember these? So when we do this addPlotSeries, we configure what these series are supposed to look like. And AndroidPlot has this really great feature where you can tell it to configure your graph based on some XML file. And this is what this XML file looks like. And as you can see, in there, you will find this linePaint.StrokeWidth. And what this library will do, it will take this XML file, it goes through all the attributes in there, and then it will call corresponding getters and setters on an object it's trying to configure. So we'll take a graph object, and then we'll call the get linePaint getter, and then set the StrokeWidth property. This is a very standard pattern of configuring something at runtime, and it's a really great feature. But if R8 sees this, it cannot understand this, because R8 cannot make the connection between this XML file and the actual classes. Also, as this is freeform XML, we don't have AAPT to help us. Instead, we have to do this ourselves. And this is the corresponding keep rule. What do we need to do? We don't want to keep [INAUDIBLE] extra classes, because again, what we're trying to do here is we're trying to take an object at runtime that already exists, and then we try to configure it by calling getters and setters. So that's why we use keep class members. That doesn't keep any extra classes, but it tells R8 if you're already keeping a class in the com AndroidPlot package, also keep these members, and don't rename them. And what members do we want to keep? First of all, we want to keep getters. And what to getters look like? They return some result-- that's the three stars. They start with Get, and they typically have no arguments. So that first line in there will keep your getters. Similarly, we can keep setters. So they don't return anything, but they start with set, and they take a single argument of some type. So this rule will now keep the getters and setters, so that runtime, the configuaration, can just happen. So one more keep rule. Do you think it will work now? Let's take a look. And yes, we made it. So we added a couple of keep rules. And now, our application is actually running again, but was it worth it? Because this was a bit of an investment, right? We had to look at the code. We had to figure out what it actually does. Was this journey worthwhile? Let's go back to the APK analyzer. And here, you can see the results. So if you look at com Google, which is Guava, that went from 1.3 megabytes to just 8K. I know that's very extreme, because I'm using essentially a couple of few classes in this huge collection. But also for AndroidPlot, which is much more realistic, you can see that it went from 180K to about 100K, and that's more than one entire [INAUDIBLE].. Also, if you look at my app, you can see that it went from 35K to 2K. And 2K is a lot closer to the 100 lines of code I actually wrote, because we removed all the unneeded auto-generated parts that the build system has created for us. Again, I've created this graphic for you to make this a bit more visual. And as you can see, Guava nearly disappeared. AndroidPlot halved, and my app also turned into this little sliver there. So it was a bit of a journey, but it really, really paid off. So what's the takeaway lessons here? I hope I was able to convince you that it actually makes sense to look into size. If you build an app, no matter what your target audience is, please invest into size. Please invest into creating keep rules. Please use ProGuard or R8 to actually shrink your app. But also, I hope I've shown you some ideas of how to make this easier. And the first thing to really take away here is, you should consider a size early on. Because while you are writing your code, it is really, really easy to also write your keep rules, because you still understand what that code is actually doing. Write all the code examples we looked at, if you had just written them, it would be easy to understand why this goes wrong. Also, you should add structure to your code to ease describing reflective use. You remember those getters and setters example I had, where I said, OK, all classes and AndroidPlot, keep the getters and setters? These kind of things are much simpler if you actually have some kind of interface that allows you to tighten these up. So if you had an interface, say, runtime configured object, you could just say in a keep rule, every object that extends runtime configured, keep these getters and setters. And then it's independent of the actual application, then it can just be part of that library. And lastly, and this sounds obvious, but it's really important. You should continuously test and optimize build. Again, the earlier you find regressions in your build, the easier it is to fix them, because you will still remember what you actually changed, and they will make it easy to come up with keep rules. If you are a library developer, you should really, really carefully provide keep rules, because there's this multiplayer. If you make precise keep rules, all the users of your library will benefit from it, and a lot of apps will become smaller. Don't make this an afterthought. Invest into keep rules while you build your library, while you design your library. Publish them somewhere where people can find them, because that's typically what we all do. We will search the internet for rules to use, so put them on your home page, put them in the readme file, make it visible. And lastly, consider using consumer ProGuard files when you are shipping via the AAR system, because this makes it completely transparent to your library users. When they enable ProGuard, they will automatically get your Keep Rules. Lastly, please give us feedback. So we've built R8, we have tested it. We believe it's a drop-in replacement. But only you can actually find that out. So if you're using ProGuard today, give R8 a try, tell us how it worked for you. If you're not yet using ProGuard, try out R8, and see how far you can get with shrinking, and how good our diagnostics. File bugs. We have a really responsive team, and we really care about your feedback. And lastly, after this talk, you can also see me at the sandbox, or find me outside if you want to have a chat. And with that, I want to thank you very much for your attention. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Android Developers
Views: 24,325
Rating: 4.8981481 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate
Id: x9T5EYE-QWQ
Channel Id: undefined
Length: 36min 32sec (2192 seconds)
Published: Thu May 10 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.