TOMAS MLYMARIC: Welcome. My name is Tomás Mlynaric,
and I'm a developer relations engineer. Let's talk about how to enhance
the performance of your Jetpack Compose app. I think you're here because
maybe at some point, your app wasn't smooth enough. Hopefully, you verified the same
behavior with the release mode on a real device with R8 enabled
and custom baseline profiles generated. If not, then please start
there, as you might not have a performance problem at all. Let's see what effect these
flags may have on your app. By default, R8 is
off in the project and you get baseline
profiles coming from Jetpack Compose or
other libraries that you use. Just by enabling R8,
you get better results. And by enabling R8
with all optimizations, you get even better results. That is, you get an 11%
improvement compared to the default setting. But R8 can't beat
custom-generated baseline profiles. So even with R8 off, you
get better performance. But, as you can assume now, the
best combination is with R8 on and all optimizations enabled. In this case, it's 27%
faster than the default setting with R8 off and only
libraries baseline profiles. If you have the flags enabled
and you still see a performance problem, you might
have the temptation to go in all directions
at once and start fixing stability
here, tweaking layout, changing some animations. But fight the
temptation because you don't know if changing all
the things actually help. If you can, try to
measure performance first to get some initial
understanding. Debug if you don't
know what's wrong. And improve the
problematic sections. Let's start with measuring to
get the initial understanding. To measure performance
of your app, you can use Jetpack
Macrobenchmark. Macrobenchmark allows you
to measure performance without knowing any internals
of the measured app. It uses UI Automator, which
is a UI testing library that mimics user behavior
and interacts with the visible
elements on-screen. It's currently recommended way
to get the same performance your users would see because
it can install your app with baseline profiles. And although you
might think having benchmarks is only helpful
when you have the CI infrastructure set up and
you're catching regressions, it's beneficial to
have them locally for accurate performance
measurements as well. Let's recap what you can
measure with Macrobenchmark. There are two main metrics that
are usually used either/or. There's startup timing metric to
measure how long your app takes to start and fully load or frame
timing metric to understand when jank might be occurring. But these are not
the only metrics. You can combine multiple
metrics together. For example,
TraceSectionMetric, which allows you to measure almost any
code that happens in your app. Or if you have issues with
out-of-memory crashes, you might be interested
in memory usage metric. Or if you're concerned
with battery usage, you can use power metric. We've already covered some
benchmarking basics in MAD Skills videos on performance. And also, we covered
some information on how to understand the
benchmark results in the "More Performance Tips for
Jetpack Compose" video if you'd like to check
the basics, I'll wait. When setting up the
project, you might have a benchmark like this. Now let's expand some knowledge
on some of the parameters to understand when to use what. Startup mode allows the
app to start differently during benchmark. But which startup
modes should you use? StartupMode.COLD is useful
for startup benchmarks because it represents the most
work your app needs to do. This mode always
restarts the process of your app in each iteration. Be aware that the process
is killed between setupBlock and measureBlock. StartupMode.WARM is useful
for frame timing benchmarks. This type of benchmarks
usually require navigating to a part of your app
without measuring performance. You can leverage the
setupBlock for those operations and rely on the fact
that the process is not killed after setupBlock. This startup mode clears
all running activities without killing the process. StartupMode.HOT might
be useful to measure some per-activity
caching mechanisms. The process, and even the
previously-running activity, won't be restarted. You can also pass null as
the startup mode parameter. In this case, Macrobenchmark
doesn't do anything with your process,
and you need to handle it yourself-- for example,
using the killProcess helper function. There is also an optional
CompilationMode parameter. Same question. Let's explain when to use
which compilation mode. The default option,
Partial compilation, uses information from baseline
profiles to pre-compile parts of your app into machine code. This mode is useful
for understanding the initial performance
of your app. But when you're in the middle
of finding and improving performance, it's beneficial
to compare with mode Full. This removes the number
of factors contributing to performance variants and lets
you focus on your improvements without any jitting activity. Be aware that startup benchmarks
might be slower with this mode, as loading the whole
pre-compiled app from disk might take longer. And there's also mode
None that doesn't use any baseline profiles,
even the ones from libraries. This is currently the only way
to compare baseline profiles in a benchmark. Let's also recap how
to measure startup with StartupTimingMetric. In the previous video on
inspecting performance, we had a benchmark that
would look similar to this. It starts the default activity
and waits for the first frame to be rendered and gives
initial display duration, and then waits
until some object is visible on-screen with
the resource ID "feed" to wait for the
fully-drawn state. We mentioned that your
app is usually not yet usable after the
first rendered frame, and you should call
Activity.reportFullyDrawn from your app as well. But why is it beneficial,
and what does it do, exactly? By calling it from your app,
you get the user-centric metric of how long it
takes for your users until they can
interact with the app. It also marks all
the code executed before the call as
startup code for profile-guided
optimizations, which minimizes the class
loading duration and improves your startup time. And if you don't call it, it
will be assumed five seconds after Activity#onCreate. But you're not getting
the best out of your app with one method call. To report it from Compose, first
add androidx.activity:activity compose dependency to your app,
and then add the ReportDrawn composable to a composable that
represents when the loading is done. Alternatively, you can
use ReportDrawnWhen, which waits until some
lambda predicate is true to decide when to report. Or you can use also
ReportDrawnAfter to wait for the suspending
function to finish, like waiting for
animation to be done. This code can stay in
your production code, as it can improve
performance over time on your user's devices. There are several
strategies how you can report fully-drawn state. For example, you wait until some
data is loaded asynchronously, and then when it's ready,
you have your content with the single
ReportDrawn call. However, your home screen might
have multiple asynchronous sections to wait for. You can simply add
ReportDrawn multiple times in your code, each representing
the asynchronous section being loaded. ReportShown automatically
waits until all of them are finished, and then
reports it only once. Or, in some cases, it
would be even more. For example, you can wait until
all images from the network are loaded and shown on screen. Once you run the
benchmark, you get two metrics,
timeToInitialDisplay and timeToFullyDrawn. If you don't call
ReportDrawn in your app, you only get the
timeToInitialDisplay metric. Let's also recap Frame
Timing benchmarks. For frame benchmarks,
you might just move the code from
measureBlock to setupBlock to navigate to screen
without measuring the first frame, which
is usually much longer and would skew
your frame results. Then you need to find
an element on-screen, setGestureMargin so that
you don't trigger the System Navigation and
minimize your app, and fling the content to get a
general idea how it performs. Fling, though, might not
scroll the exact same distance every time. So for more stable results,
you can use the drag gesture that should always
swipe the same distance and, therefore, show the same
number of items on-screen. When you run the benchmark, you
get two metrics, frameDuration, which tells you how long it
takes to produce the frames and frameOverrun. That tells you how
much time there was left until the device's limit. The negative number for
frameOverrun is good. It means there was still
time until the limit. The positive number means there
might be some visible jank on-screen. Once you measure the
initial performance, let's debug what might be wrong
when you see some jank. To do that, you can start
with system tracing. You can get system traces
from Android Studio Profiler or even recorded on a device. But Macrobenchmark also
records system traces. For each iteration it
runs, you can see them in Android Studio output pane. And if you click on it, it
will bring the Android Studio Profiler and load
the system trace. Android Studio profiler
shows you only your process, so you can quickly explore what
was happening during execution. Alternatively, you
can also use Perfetto. It's a web-based
tool that doesn't require Android Studio to run. It gives you the
whole picture of what was happening on the device
and can reveal relationships between different processes. And it also allows you to
investigate deeper problems, for example, by
running SQL queries. So to use the traces
from Macrobenchmark, you can locate them
in your project in the benchmark module, build,
outputs folder, connected android test
additional output, the build variant name, the
device name, and there you can find all the traces. Be aware, though, this folder
is rewritten every time you rerun the benchmarks. So you need to copy the
results to a different folder if you want to retain them. From there, you can just
drag and drop the trace file onto Perfetto UI. And once it loads, you
can see everything that is happening on the device. Then you need to find
your process name in the list of processes. And once you expand it, you
can see the expected timeline, actual timeline, the
main thread, and then a list of the rest of
threads running in your app. The expected timeline tells
you when the system expects the frames to be produced. The actual timeline
tells you the real timing of producing the frames. The green frames were
produced on time. There was no jank. The red frame are
the interesting ones. They take longer to
produce than they should. So they are the cause of jank. There also can be
light green frames. These frames had enough
time to be produced, but they were produced too late. Let's now see the trace
sections in the main thread. Each section here is the sum
of the sections below it. Compose already pre-populated
some information by default. So, for example, you
can see trace section for each time a composable
was recomposing, or you can see when the lazylist
was prefetching the next item. But to debug why
this frame janks, it's not enough information. So it's hard to understand
what might be wrong. To improve that, you can
add a tracing-ktx library to your app module, which allows
adding custom trace sections. And then, in a composable
you're interested in to understand how long
it takes to Compose, you can wrap it with a
trace call with some name. The custom trace section
adds almost no overhead, so you don't have to
worry about leaving it in your production code. Then, when you retake the
system trace-- for example, by running the benchmark again-- you will see the trace sections
based on the name you defined. And this gives you information
on when that composable was recomposing. So in this case, you can
see that there are probably three items on
screen, and one is being prefetched by a lazylist. Great. So this helps understanding
how long something composes, but it's a manual process. To get even more information
on when something recomposes, you can set up
composition tracing. Composition tracing essentially
wraps every composable with a trace call. To set it up, you need to add
runtime-tracing dependency to your app module first. And at this point,
if you run system tracing from Android
Studio Profiler, you would already see
all the information. But to run composition
tracing with Macrobenchmark, you need to add two
more dependencies to the benchmark module,
namely tracing-perfetto and tracing-perfetto-binary. Don't add these dependencies
to your app module, as these dependencies
are quite big and would negatively affect
the size of your app. The benchmarking module is
not shipped with your app, so you can safely have it there. Once you have that, you need
to enable composition tracing using perfettoSdkTracing.enable
instrumentation argument, which you can do from the Run
Configurations dialog. You can find the instrumentation
argument option in the dialog. Pass the argument there
and rerun the benchmarks. Once you rerun the benchmark
and open the system trace again, you will see each composable
that was recomposed during the execution. Let's zoom in to see
the prefetch phase. You can see here some trace
sections when the lazylist was composing an item. Then you can see what
item was composed. You can even see the trace
section HighlightedSnackItem I previously added to the code. And then you can see what
the HighlightedSnackItem is composed of-- some
snack image, text, and other composables. Composition tracing gives you
information on the composition phase. But Jetpack Compose
has three phases-- composition, layout, and draw. You can get some information
about the layout phase, as well. What you need to do is add
the custom layout modifier, then measure the measurable
without any change constraints, place it in the Offset.Zero, and
wrap this code in a trace call with a name. This gives you
information on how long it takes to measure all the
children of this composable. If you think placement
is also important, you can wrap the
placeable.place method, as well. When you retake
the system trace, you can now search for the
measureAndLayout trace section. And you can see how long
it took to the composable to measure its children. This can be helpful if you
have a custom layout composable or just if you want to get
more information on how long it takes to measure and
layout any composable. Also, if you nest
these layout modifiers, you can also see how long
that nested composable took to measure. Now, it's great to
see the system tracing with so much information,
but can we measure it? Yes! You can use the
TraceSectionMetric for that. TraceSectionMetric can
measure any trace section using the section name. It has two modes. First, which measures the first
occurrence of a trace section, or Mode.Sum that combines
timing of all sections by the same name. And you can also use
wildcards in the section name. To use trace section
metric in Macrobenchmark, just add it to the
list of metrics. So here, I'd be measuring
the custom trace section for a HighlightedSnackItem. Here, I'd be measuring the
timing on the layout phase. And here, I'd be measuring
section names that were produced by the
composition tracing with the help of the
percent wildcard character. When you run the benchmarks
and get the outputs, because I used Mode.Sum,
I get the total duration of the section names. But I also get the
count, which helps me understand how many times
this composable is recomposed. Similarly, I get information
on the custom layout modifier with its count and same
for the section names found with the wildcard. But wait a minute. This composable
recomposed 361 times. Something's not right here. To check when something
recomposes more than it should, you can also use the
Layout Inspector. You can use it in the debug
mode to check directly your hierarchy when
something recomposes. While inspecting a
screen in JITSnack, I noticed a large
amount of recompositions when scrolling the list
of highlighted snacks. The recomposition
count tells you how many times a composable
item is recomposed. You can also see how many
times a composable was skipped. If composables
input didn't change, and therefore the
Compose runtime doesn't have to recompose
the composable every time. As a rule of thumb, if
something recomposes very often during a scroll, it might
cause performance issues. So now we know what
is causing the issue, but we don't know why. To help understand why, you can
use another tool, Composition Debugger. Composition Debugger works
with the regular debugger you already use in your apps. So you just set a breakpoint
somewhere in your Compose code and debug your app. And when the debugger
stops at this breakpoint, it gives you the composition
state information. There are several
flags you might see in the recomposition state. There's Static, which means
this parameter won't change during the whole app's process. There's Unchanged, which means
this parameter is the same as previous composition. You can see Evaluating. That means that
Compose runtime is evaluating whether this
argument has changed here. If you add a breakpoint to
the nested composable that has the same
parameter, you should be able to see whether it
actually changed or not. You also might see Unstable. This argument is
of Unstable type and thus causes recomposition. And then there can
be Changed, which says that it has
a different value than in previous composition
and definitely is causing recomposition. So in our example,
this is the one that's triggered with
every scroll change. Let's now get to techniques
on how to improve performance in your app. First of all, you should
update your version of Compose and keep it up-to-date. We're doing performance
improvements under the hood, so just by updating
Compose, you'll see better performance in your
app without any extra work. Second, generate
a baseline profile to get more code running
in the precompiled state. With Macrobenchmark, you
can measure how much code is running in interpreted mode. You just need to add
the TraceSectionMetric with just-in-time compiling
name and the wildcard character and use Mode.Sum. This will measure
all the jitting that occurred during the benchmark. Let's see now on Android
how much jitting occurs until the UI is fully drawn. If you don't use
baseline profiles, the app has 500
milliseconds of jitting, which means 500
milliseconds of code running in interpreted mode. If you have a stale
baseline profile-- which, in this case, was
generated almost a year ago-- you get much faster execution. But if you can, try to have your
baseline profiles up-to-date for the best results. In the best case, you
have 8 times less jitting, which means 8 times less code
running in interpreted mode. Number three, in
our small example, something was
recomposing too much. If you recompose
too often, based on animations or some
frequent state changes, you can improve performance
by deferring Compose phases. Here we have a simplified
version of the HighlightedSnack section. We have a LazyRow
with snack items, and the requirement is to
offset the items background gradient with scroll. So we read the
scroll state here, pass it to the items, where
we use the startX to offset the background gradient. But because we read the
scrolling state in composition, this whole composable
needs to recompose with every change, which
is why we saw it previously recomposing that many times. But we don't need this
information in the composition phase at all. We can wrap the scroll
reading in a lambda and pass that lambda
to the items instead. Just by doing this, we skipped
recomposing the whole LazyRow composable. But we're still
recomposing the items here. And so to skip recomposing
the items as well, we can use a different
modifier that wouldn't read the state
in the composition phase. Namely, we can use
Modifier.drawBehind with drawRect and the
same horizontalGradient. This way, the scroll state is
read only in the draw phase and we skipped recomposing
it with every scroll change. So whenever you have some
frequently changing state, like animations or
reading scroll state, consider using the
deferred alternatives to skip recomposing too often. For example, for a
frequently-changing background, you can use any of the draw
modifiers like drawBehind. If you're offsetting a
composable on-screen, you can use the offset
lambda modifier. For alpha values,
consider graphicsLayer with alpha parameter. Same for rotation or
scaling modifiers. In case you would
change alpha, rotate, or scale at the same
time, only one graphics layer block is needed. If you're missing a modifier
that would defer the face, you can always use a custom
layout modifier, for example, when animating size or padding. Remember, it's perfectly fine
to have all the modifiers in your code base. If it's recomposing
often though, you might consider the
deferred alternatives. Number four, use
BoxWithConstraints only when you need it. The good use case for
BoxWithConstraints is when you want to compose
a different UI based on available size. But we've seen
usages just to get size of a composable,
which comes with unnecessary overhead. If you need to get
size of a composable, you can use Modifier.layout,
Modifier.onSizeChanged, onPlaced, or
onGloballyPositioned. Be aware, though, if you
use one of those modifiers and you set the size to
a state which you then read from a
different composable, you're effectively lagging
your UI by one frame. So it's better to use a
custom layout in this case. Number five, remember
only heavy operations. It has minimal effect
on light operations. For fast light
operations, it actually takes longer to execute. For heavy operations, though,
it's definitely better to remember them as
they only run once and therefore, don't
take time several times during composition. But ideally, you shouldn't do
anything heavy in composition anyway. If you have something heavy,
try to execute your code in a datalayer or a viewmodel. And don't compute heavy
things in the main thread. Speaking of heavy
computations, number six, load heavy images
asynchronously. This system trace represents
using painter resource to load a huge image,
which took 75 milliseconds to load on the main thread. Instead, this
system trace is when using a library for
asynchronous image loading, like Glide or Coil,
which actually takes about 2 milliseconds to load. So instead of using
painterResource directly, you can use
rememberAsyncImagePainter, which loads the
image asynchronously. The same thing applies when you
load images from the network. If you use a painterResource
to load a placeholder, you can use the same
async method instead. And lastly, number seven. You might try split heavy
frames to prevent jank. Sometimes there might be
some really heavy UI that consistently drops some
frames, and you can't really make it lighter. For example, you're
transitioning to some UI-heavy screen. You can split
initializing that screen into multiple parts, which
can help with that frame drop and maybe get better
user experience. You can achieve this by
producing a state in a side effect with a different
initial value and target value, which would
cause recomposition of this composable. And so, in the
first composition, you show some placeholder first. In this scenario,
maybe the video is loading from the network
so your users wouldn't even see the difference anyway. And then, in the
next composition, it composes the video player. This way, the
initialization is chunked into multiple
compositions, which can lead to better experience. And that's it. We went through the process
of enhancing performance. But it's not finished. You should now
measure again to see how much the improvements
helped and, if needed, start the process again. But this time,
I'll let you do it.