Enhancing Jetpack Compose app performance

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

TOMAS MLYMARIC: Welcome. My name is Tomás Mlynaric, and I'm a developer relations engineer. Let's talk about how to enhance the performance of your Jetpack Compose app. I think you're here because maybe at some point, your app wasn't smooth enough. Hopefully, you verified the same behavior with the release mode on a real device with R8 enabled and custom baseline profiles generated. If not, then please start there, as you might not have a performance problem at all. Let's see what effect these flags may have on your app. By default, R8 is off in the project and you get baseline profiles coming from Jetpack Compose or other libraries that you use. Just by enabling R8, you get better results. And by enabling R8 with all optimizations, you get even better results. That is, you get an 11% improvement compared to the default setting. But R8 can't beat custom-generated baseline profiles. So even with R8 off, you get better performance. But, as you can assume now, the best combination is with R8 on and all optimizations enabled. In this case, it's 27% faster than the default setting with R8 off and only libraries baseline profiles. If you have the flags enabled and you still see a performance problem, you might have the temptation to go in all directions at once and start fixing stability here, tweaking layout, changing some animations. But fight the temptation because you don't know if changing all the things actually help. If you can, try to measure performance first to get some initial understanding. Debug if you don't know what's wrong. And improve the problematic sections. Let's start with measuring to get the initial understanding. To measure performance of your app, you can use Jetpack Macrobenchmark. Macrobenchmark allows you to measure performance without knowing any internals of the measured app. It uses UI Automator, which is a UI testing library that mimics user behavior and interacts with the visible elements on-screen. It's currently recommended way to get the same performance your users would see because it can install your app with baseline profiles. And although you might think having benchmarks is only helpful when you have the CI infrastructure set up and you're catching regressions, it's beneficial to have them locally for accurate performance measurements as well. Let's recap what you can measure with Macrobenchmark. There are two main metrics that are usually used either/or. There's startup timing metric to measure how long your app takes to start and fully load or frame timing metric to understand when jank might be occurring. But these are not the only metrics. You can combine multiple metrics together. For example, TraceSectionMetric, which allows you to measure almost any code that happens in your app. Or if you have issues with out-of-memory crashes, you might be interested in memory usage metric. Or if you're concerned with battery usage, you can use power metric. We've already covered some benchmarking basics in MAD Skills videos on performance. And also, we covered some information on how to understand the benchmark results in the "More Performance Tips for Jetpack Compose" video if you'd like to check the basics, I'll wait. When setting up the project, you might have a benchmark like this. Now let's expand some knowledge on some of the parameters to understand when to use what. Startup mode allows the app to start differently during benchmark. But which startup modes should you use? StartupMode.COLD is useful for startup benchmarks because it represents the most work your app needs to do. This mode always restarts the process of your app in each iteration. Be aware that the process is killed between setupBlock and measureBlock. StartupMode.WARM is useful for frame timing benchmarks. This type of benchmarks usually require navigating to a part of your app without measuring performance. You can leverage the setupBlock for those operations and rely on the fact that the process is not killed after setupBlock. This startup mode clears all running activities without killing the process. StartupMode.HOT might be useful to measure some per-activity caching mechanisms. The process, and even the previously-running activity, won't be restarted. You can also pass null as the startup mode parameter. In this case, Macrobenchmark doesn't do anything with your process, and you need to handle it yourself-- for example, using the killProcess helper function. There is also an optional CompilationMode parameter. Same question. Let's explain when to use which compilation mode. The default option, Partial compilation, uses information from baseline profiles to pre-compile parts of your app into machine code. This mode is useful for understanding the initial performance of your app. But when you're in the middle of finding and improving performance, it's beneficial to compare with mode Full. This removes the number of factors contributing to performance variants and lets you focus on your improvements without any jitting activity. Be aware that startup benchmarks might be slower with this mode, as loading the whole pre-compiled app from disk might take longer. And there's also mode None that doesn't use any baseline profiles, even the ones from libraries. This is currently the only way to compare baseline profiles in a benchmark. Let's also recap how to measure startup with StartupTimingMetric. In the previous video on inspecting performance, we had a benchmark that would look similar to this. It starts the default activity and waits for the first frame to be rendered and gives initial display duration, and then waits until some object is visible on-screen with the resource ID "feed" to wait for the fully-drawn state. We mentioned that your app is usually not yet usable after the first rendered frame, and you should call Activity.reportFullyDrawn from your app as well. But why is it beneficial, and what does it do, exactly? By calling it from your app, you get the user-centric metric of how long it takes for your users until they can interact with the app. It also marks all the code executed before the call as startup code for profile-guided optimizations, which minimizes the class loading duration and improves your startup time. And if you don't call it, it will be assumed five seconds after Activity#onCreate. But you're not getting the best out of your app with one method call. To report it from Compose, first add androidx.activity:activity compose dependency to your app, and then add the ReportDrawn composable to a composable that represents when the loading is done. Alternatively, you can use ReportDrawnWhen, which waits until some lambda predicate is true to decide when to report. Or you can use also ReportDrawnAfter to wait for the suspending function to finish, like waiting for animation to be done. This code can stay in your production code, as it can improve performance over time on your user's devices. There are several strategies how you can report fully-drawn state. For example, you wait until some data is loaded asynchronously, and then when it's ready, you have your content with the single ReportDrawn call. However, your home screen might have multiple asynchronous sections to wait for. You can simply add ReportDrawn multiple times in your code, each representing the asynchronous section being loaded. ReportShown automatically waits until all of them are finished, and then reports it only once. Or, in some cases, it would be even more. For example, you can wait until all images from the network are loaded and shown on screen. Once you run the benchmark, you get two metrics, timeToInitialDisplay and timeToFullyDrawn. If you don't call ReportDrawn in your app, you only get the timeToInitialDisplay metric. Let's also recap Frame Timing benchmarks. For frame benchmarks, you might just move the code from measureBlock to setupBlock to navigate to screen without measuring the first frame, which is usually much longer and would skew your frame results. Then you need to find an element on-screen, setGestureMargin so that you don't trigger the System Navigation and minimize your app, and fling the content to get a general idea how it performs. Fling, though, might not scroll the exact same distance every time. So for more stable results, you can use the drag gesture that should always swipe the same distance and, therefore, show the same number of items on-screen. When you run the benchmark, you get two metrics, frameDuration, which tells you how long it takes to produce the frames and frameOverrun. That tells you how much time there was left until the device's limit. The negative number for frameOverrun is good. It means there was still time until the limit. The positive number means there might be some visible jank on-screen. Once you measure the initial performance, let's debug what might be wrong when you see some jank. To do that, you can start with system tracing. You can get system traces from Android Studio Profiler or even recorded on a device. But Macrobenchmark also records system traces. For each iteration it runs, you can see them in Android Studio output pane. And if you click on it, it will bring the Android Studio Profiler and load the system trace. Android Studio profiler shows you only your process, so you can quickly explore what was happening during execution. Alternatively, you can also use Perfetto. It's a web-based tool that doesn't require Android Studio to run. It gives you the whole picture of what was happening on the device and can reveal relationships between different processes. And it also allows you to investigate deeper problems, for example, by running SQL queries. So to use the traces from Macrobenchmark, you can locate them in your project in the benchmark module, build, outputs folder, connected android test additional output, the build variant name, the device name, and there you can find all the traces. Be aware, though, this folder is rewritten every time you rerun the benchmarks. So you need to copy the results to a different folder if you want to retain them. From there, you can just drag and drop the trace file onto Perfetto UI. And once it loads, you can see everything that is happening on the device. Then you need to find your process name in the list of processes. And once you expand it, you can see the expected timeline, actual timeline, the main thread, and then a list of the rest of threads running in your app. The expected timeline tells you when the system expects the frames to be produced. The actual timeline tells you the real timing of producing the frames. The green frames were produced on time. There was no jank. The red frame are the interesting ones. They take longer to produce than they should. So they are the cause of jank. There also can be light green frames. These frames had enough time to be produced, but they were produced too late. Let's now see the trace sections in the main thread. Each section here is the sum of the sections below it. Compose already pre-populated some information by default. So, for example, you can see trace section for each time a composable was recomposing, or you can see when the lazylist was prefetching the next item. But to debug why this frame janks, it's not enough information. So it's hard to understand what might be wrong. To improve that, you can add a tracing-ktx library to your app module, which allows adding custom trace sections. And then, in a composable you're interested in to understand how long it takes to Compose, you can wrap it with a trace call with some name. The custom trace section adds almost no overhead, so you don't have to worry about leaving it in your production code. Then, when you retake the system trace-- for example, by running the benchmark again-- you will see the trace sections based on the name you defined. And this gives you information on when that composable was recomposing. So in this case, you can see that there are probably three items on screen, and one is being prefetched by a lazylist. Great. So this helps understanding how long something composes, but it's a manual process. To get even more information on when something recomposes, you can set up composition tracing. Composition tracing essentially wraps every composable with a trace call. To set it up, you need to add runtime-tracing dependency to your app module first. And at this point, if you run system tracing from Android Studio Profiler, you would already see all the information. But to run composition tracing with Macrobenchmark, you need to add two more dependencies to the benchmark module, namely tracing-perfetto and tracing-perfetto-binary. Don't add these dependencies to your app module, as these dependencies are quite big and would negatively affect the size of your app. The benchmarking module is not shipped with your app, so you can safely have it there. Once you have that, you need to enable composition tracing using perfettoSdkTracing.enable instrumentation argument, which you can do from the Run Configurations dialog. You can find the instrumentation argument option in the dialog. Pass the argument there and rerun the benchmarks. Once you rerun the benchmark and open the system trace again, you will see each composable that was recomposed during the execution. Let's zoom in to see the prefetch phase. You can see here some trace sections when the lazylist was composing an item. Then you can see what item was composed. You can even see the trace section HighlightedSnackItem I previously added to the code. And then you can see what the HighlightedSnackItem is composed of-- some snack image, text, and other composables. Composition tracing gives you information on the composition phase. But Jetpack Compose has three phases-- composition, layout, and draw. You can get some information about the layout phase, as well. What you need to do is add the custom layout modifier, then measure the measurable without any change constraints, place it in the Offset.Zero, and wrap this code in a trace call with a name. This gives you information on how long it takes to measure all the children of this composable. If you think placement is also important, you can wrap the placeable.place method, as well. When you retake the system trace, you can now search for the measureAndLayout trace section. And you can see how long it took to the composable to measure its children. This can be helpful if you have a custom layout composable or just if you want to get more information on how long it takes to measure and layout any composable. Also, if you nest these layout modifiers, you can also see how long that nested composable took to measure. Now, it's great to see the system tracing with so much information, but can we measure it? Yes! You can use the TraceSectionMetric for that. TraceSectionMetric can measure any trace section using the section name. It has two modes. First, which measures the first occurrence of a trace section, or Mode.Sum that combines timing of all sections by the same name. And you can also use wildcards in the section name. To use trace section metric in Macrobenchmark, just add it to the list of metrics. So here, I'd be measuring the custom trace section for a HighlightedSnackItem. Here, I'd be measuring the timing on the layout phase. And here, I'd be measuring section names that were produced by the composition tracing with the help of the percent wildcard character. When you run the benchmarks and get the outputs, because I used Mode.Sum, I get the total duration of the section names. But I also get the count, which helps me understand how many times this composable is recomposed. Similarly, I get information on the custom layout modifier with its count and same for the section names found with the wildcard. But wait a minute. This composable recomposed 361 times. Something's not right here. To check when something recomposes more than it should, you can also use the Layout Inspector. You can use it in the debug mode to check directly your hierarchy when something recomposes. While inspecting a screen in JITSnack, I noticed a large amount of recompositions when scrolling the list of highlighted snacks. The recomposition count tells you how many times a composable item is recomposed. You can also see how many times a composable was skipped. If composables input didn't change, and therefore the Compose runtime doesn't have to recompose the composable every time. As a rule of thumb, if something recomposes very often during a scroll, it might cause performance issues. So now we know what is causing the issue, but we don't know why. To help understand why, you can use another tool, Composition Debugger. Composition Debugger works with the regular debugger you already use in your apps. So you just set a breakpoint somewhere in your Compose code and debug your app. And when the debugger stops at this breakpoint, it gives you the composition state information. There are several flags you might see in the recomposition state. There's Static, which means this parameter won't change during the whole app's process. There's Unchanged, which means this parameter is the same as previous composition. You can see Evaluating. That means that Compose runtime is evaluating whether this argument has changed here. If you add a breakpoint to the nested composable that has the same parameter, you should be able to see whether it actually changed or not. You also might see Unstable. This argument is of Unstable type and thus causes recomposition. And then there can be Changed, which says that it has a different value than in previous composition and definitely is causing recomposition. So in our example, this is the one that's triggered with every scroll change. Let's now get to techniques on how to improve performance in your app. First of all, you should update your version of Compose and keep it up-to-date. We're doing performance improvements under the hood, so just by updating Compose, you'll see better performance in your app without any extra work. Second, generate a baseline profile to get more code running in the precompiled state. With Macrobenchmark, you can measure how much code is running in interpreted mode. You just need to add the TraceSectionMetric with just-in-time compiling name and the wildcard character and use Mode.Sum. This will measure all the jitting that occurred during the benchmark. Let's see now on Android how much jitting occurs until the UI is fully drawn. If you don't use baseline profiles, the app has 500 milliseconds of jitting, which means 500 milliseconds of code running in interpreted mode. If you have a stale baseline profile-- which, in this case, was generated almost a year ago-- you get much faster execution. But if you can, try to have your baseline profiles up-to-date for the best results. In the best case, you have 8 times less jitting, which means 8 times less code running in interpreted mode. Number three, in our small example, something was recomposing too much. If you recompose too often, based on animations or some frequent state changes, you can improve performance by deferring Compose phases. Here we have a simplified version of the HighlightedSnack section. We have a LazyRow with snack items, and the requirement is to offset the items background gradient with scroll. So we read the scroll state here, pass it to the items, where we use the startX to offset the background gradient. But because we read the scrolling state in composition, this whole composable needs to recompose with every change, which is why we saw it previously recomposing that many times. But we don't need this information in the composition phase at all. We can wrap the scroll reading in a lambda and pass that lambda to the items instead. Just by doing this, we skipped recomposing the whole LazyRow composable. But we're still recomposing the items here. And so to skip recomposing the items as well, we can use a different modifier that wouldn't read the state in the composition phase. Namely, we can use Modifier.drawBehind with drawRect and the same horizontalGradient. This way, the scroll state is read only in the draw phase and we skipped recomposing it with every scroll change. So whenever you have some frequently changing state, like animations or reading scroll state, consider using the deferred alternatives to skip recomposing too often. For example, for a frequently-changing background, you can use any of the draw modifiers like drawBehind. If you're offsetting a composable on-screen, you can use the offset lambda modifier. For alpha values, consider graphicsLayer with alpha parameter. Same for rotation or scaling modifiers. In case you would change alpha, rotate, or scale at the same time, only one graphics layer block is needed. If you're missing a modifier that would defer the face, you can always use a custom layout modifier, for example, when animating size or padding. Remember, it's perfectly fine to have all the modifiers in your code base. If it's recomposing often though, you might consider the deferred alternatives. Number four, use BoxWithConstraints only when you need it. The good use case for BoxWithConstraints is when you want to compose a different UI based on available size. But we've seen usages just to get size of a composable, which comes with unnecessary overhead. If you need to get size of a composable, you can use Modifier.layout, Modifier.onSizeChanged, onPlaced, or onGloballyPositioned. Be aware, though, if you use one of those modifiers and you set the size to a state which you then read from a different composable, you're effectively lagging your UI by one frame. So it's better to use a custom layout in this case. Number five, remember only heavy operations. It has minimal effect on light operations. For fast light operations, it actually takes longer to execute. For heavy operations, though, it's definitely better to remember them as they only run once and therefore, don't take time several times during composition. But ideally, you shouldn't do anything heavy in composition anyway. If you have something heavy, try to execute your code in a datalayer or a viewmodel. And don't compute heavy things in the main thread. Speaking of heavy computations, number six, load heavy images asynchronously. This system trace represents using painter resource to load a huge image, which took 75 milliseconds to load on the main thread. Instead, this system trace is when using a library for asynchronous image loading, like Glide or Coil, which actually takes about 2 milliseconds to load. So instead of using painterResource directly, you can use rememberAsyncImagePainter, which loads the image asynchronously. The same thing applies when you load images from the network. If you use a painterResource to load a placeholder, you can use the same async method instead. And lastly, number seven. You might try split heavy frames to prevent jank. Sometimes there might be some really heavy UI that consistently drops some frames, and you can't really make it lighter. For example, you're transitioning to some UI-heavy screen. You can split initializing that screen into multiple parts, which can help with that frame drop and maybe get better user experience. You can achieve this by producing a state in a side effect with a different initial value and target value, which would cause recomposition of this composable. And so, in the first composition, you show some placeholder first. In this scenario, maybe the video is loading from the network so your users wouldn't even see the difference anyway. And then, in the next composition, it composes the video player. This way, the initialization is chunked into multiple compositions, which can lead to better experience. And that's it. We went through the process of enhancing performance. But it's not finished. You should now measure again to see how much the improvements helped and, if needed, start the process again. But this time, I'll let you do it.

Info

Channel: Android Developers

Views: 24,262

Rating: undefined out of 5

Keywords: jetpack compose, jetpack compose tutorial, jetpack compose latest, jetpack compose updates, developer, developers, android developer, android developers, google developers, android, google, android latest, android updates, Tomáš Mlynarič

Id: Z96wfbID_Yc

Channel Id: undefined

Length: 27min 41sec (1661 seconds)

Published: Wed Dec 06 2023