[MUSIC PLAYING] CHET HAASE: Hi. Welcome to our
talk, "Drawn Out-- How Android Renders the UI." It was almost called
something else. We called it this, and then
somebody in some administration position decided it
was actually going to be called "How
to Optimize Your App for Top Rendering
Performance" or something. That wasn't what
the talk was about, so fortunately, we
changed it back. ROMAIN GUY: And it still isn't. CHET HAASE: Instead, we're going
to tell you the details of how, actually, stuff works. I'm Chet Haase. I'm from the Android
toolkit team. ROMAIN GUY: And I'm Romain Guy. I'm on the Android
framework team, and I do real time graphics. CHET HAASE: And that's kind of
what we're talking about today. So we have given versions
of this talk before, and we thought we were done. And then we realized
that enough stuff has changed inside the
system that maybe it was time to go through this
again and see where we're at. So this is our
attempt to do that. Let's go. So first of all, there
is this word, rendering. What do we mean by that? Normally, it means melting
fat in order to clarify it. That's not quite what we're
going to talk about today. Instead, we're talking about
the process of actually turning all that stuff, like
buttons and check boxes and everything on the
screen, into pixels that the user can look at. And there's a lot
of things going on. There's a lot of
details that we are glossing over today
because we only have 40 minutes to do this. But we'll dump a lot of
details on you along the way. So first of all, I'm
going to take you through-- there's going
to be a bunch of colored dots on the top,
and these will be sort of a visual cue for a
lot of the rest of the talk. So I'm going to
walk through sort of the life of what happens
in the flow of information down to the pixels on the
screen really quickly. We have this thing
called the choreographer, and that kicks in usually
60 times a second, and it says, hey, Vsync,
this is the interval at which the frame is being synced. Buffers flip onto the screen. It's a good time for us to
process a lot of information, and then handle rendering
that information as a result of that. So we get a Vsync
operation that's sent up to the Java SDK lands,
and we're on the UI thread. And all of a sudden, we need to
process input events, which can trigger changes in properties. We also run any animations. So we change property values. Again, that may trigger things
like layouts and invalidation. We do, then, the whole traversal
pass, which is measuring views to figure out how
large they are, laying them out, which is
actually positioning them where they need to be,
and then drawing them. Once all of that
information is done, we sync the result
of that information over to this thing
called the render thread. And the render thread takes
that and says, OK, well, I'm going to then execute these. I'm going to
basically turn these into native versions
of all that information we produced at the Java
layer, and then I'm going to get a
buffer from the GPU, so that I have a place to
write this information. And then I'm going to actually
issue all these GPU commands as OpenGL stuff over there. And then I'm going to say, OK,
it's time to swap the buffer, and then I turn it
over to the GPU. And then the
graphics system does something called
compositing, and we're going to talk about most
of these steps today. ROMAIN GUY: So compositing
is, I think, something we've never explained before. So we're going to go in
a little bit of details about this part of the Android
rendering architecture. CHET HAASE: So the
little colored dots on the top of the
screen will be noting where we are in the process, as
we work through a few examples, so that we can
understand this better. Speaking of examples,
here's a simple one. So let's suppose
that we have a user, and the user clicks on an item. I wrote this awesome
RecyclerView application that looks exactly like this. I know it is because
that's a screen shot from my awesome application. It's a RecyclerView with
a bunch of items in it, and when the user clicks on
one, this amazing thing happens. It turns into a random
color on the background. It's incredible. I could give you the
source, but I don't know. It's pretty complicated. I'm not sure you'd
understand it. So here's the amazing layout
for my amazing demo application. There's a ConstraintLayout. There is a RecyclerView
inside of it, and then I populated
it at runtime with a bunch of
random items in there. The view hierarchy
for this thing looks basically like this. In fact, it doesn't look
basically like this. It looks exactly like this. So you walk down
from the DecorView and you've got a LinearLayout
and a FrameLayout. I'm not exactly sure why we
have the deep nesting there, but whatever. History. We have a bunch of stuff
for action bar in there. None of that really matters. What we're concerned
about here is what's actually going on
in the content hierarchy, because that's what you can
affect with your application. So we have content FrameLayout,
we have the ConstraintLayout on the outside, wrapping
the RecyclerView, and then all of the items. Specifically,
these are the items that are on screen because
those are the only ones that are actually being measured
and laid out and drawn. So what happens? Let's walk through
this example and walk through that entire flow that we
went through at the beginning. So user clicks, there's
a Vsync operation. That gets sent up and we process
input during the input phase, and we notice that
this is a click. I'm glossing over
some details here. Actually, we're going to notice
first that there was a down, and then there was
an up, and then it gets processed as a click. Just take it for
granted we're eventually going to process a click here. That ends up in this
item clicked method that I have in my
amazingly complex example, and in there, we're going
to set the background color on this item to a random color. That's why I called
the method random. That gets sent over to the set
background color in view.java, which does a bunch of stuff to
set the color on the background drawable, and then
it eventually calls this method called invalidate. Invalidation is the process-- it doesn't actually
redraw the views. It's the process of telling the
View Hierarchy that something needs to be redrawn. So you get a click. That happens on the
item down at the bottom. So that item two-- you see it's
surrounded by green. We have a little
invalidate method that gets called on
that, and that basically walks up the tree. It calls a series of methods
all the way up the tree, because the view knows that
it needs to be redrawn, but it actually needs to
propagate that information all the way up the hierarchy,
so that then, we can redraw all the things
on the way down, later. So we call invalidate trial
all the way up the hierarchy. That eventually ends
up in a massive class that we have called
ViewRootImpl.java, and we have this invalidate
child method there. And that basically
does nothing except says, OK, I need to schedule
a traversal, which means, OK, I've taken
information that somebody got invalidated somewhere. That means that I need to
run my traversal code later at the end of this process. Traversal is the
process of doing all the phases that are
necessary for actually rendering that frame. Specifically, it does measure-- how big the views are, layout-- setting the views' position and
size, and drawing the views. All of that is called traversal. So we've scheduled a traversal. That's going to happen
at some later time, and that later time is now. So in the same frame, we
end up in the traversal code in this
performTraversals method. It's going to do a PerformDraw,
which ends up calling a draw method onto core view,
and basically, that's going to propagate
all the way down. So the draw method actually
ends up in an optimization that we implemented
back in Honeycomb called Get DisplayList. So a DisplayList is a structure
that stores the rendering information, right? So if you look at the way
the button code is written or view code in general,
we call graphics commands in Canvas like DrawBackgrounds,
DrawDrawable, DrawLine, whatever. But these end up as
operations in a DisplayList. This is a compact way of
representing those operations, as well as the parameters
to the operations. So we call Get DisplayList. The decor view did
not change, in fact. So it's going to say,
well, I didn't change, but I can certainly get the
DisplayList for my child. And then on all the
way down the tree, until it gets to item two,
and it says, oh, I did change. When invalidate
was called on me, that triggered something so that
I know I need to redraw myself. So Get DisplayList
actually ends up being a draw call on the
view, where it regenerates its own display list. So now, it ends up in
this onDraw method. That ends up in the operations
in the DisplayList for that. The DisplayList consists
of, for this item, basically rect
information and text information-- pretty basic. And then we have the
DisplayList for basically the entire hierarchy. So it wasn't just
the view itself, but we have the View
Hierarchy itself is reproduced in this
hierarchy of DisplayLists, all the way down. So now we have the DisplayList
for the entire tree, and that's all that we need
to do on the UI thread. Now we need to sync
that information over to the render thread,
and the render thread is a separate thread that's
actually dealing with the GPU side of this operation. On the Java side, we
produced all the information. On the native side, then,
we actually produce-- we take that information,
then sync it over to the GPU. So we have the sync
operation, where basically we copy a handle over there. And we also copy some
related information. We copy the damage
area, because it's important to know that
that item two-- that's the only thing that changed
during that frame, which means we don't need
to redraw anything else outside of that area. So we're going to copy
over the clip bounds there, so that we know what
needs to be redrawn. Now, we're also going
to do some optimization stuff like uploading bitmaps. So this is a good time to do it
at the beginning of the frame. Give them some time
to actually turn those into textures along
the way, while we're doing the other stuff. ROMAIN GUY: It mentions
here that we're uploading the non-hardware bitmaps. So hardware bitmaps is a new
type of bitmap configuration that was added in
Android O. So typically, when you have a bitmap, we
have to allocate the memory on the Java side. And then when it
comes time to draw, we have to make a copy
of the bitmap on the GPU. This is expensive. It takes time, and it doubles
the amount of RAM we're using. So using the hardware bitmaps
that are available in Oreo, you can keep the Java
side of the equation, and you can have a bitmap
that lives only on the GPU. So if you're not going to
modify this bitmap ever again, this is a really efficient
way, memory-wise, to store your bitmaps' memory. CHET HAASE: We have mentioned
the render thread before. This is something that we
introduced in the Lollipop release. It is a separate thread
that only talks to the GPU. It's native code. There is no call
outside to Java code. There are certainly no call-outs
out to application code. It just talks to the GPU. We did this-- so we still
need to do basically the same thing we did,
pre-rendered thread, which is we produce all the
DisplayList information, and then we send that
DisplayList information to the GPU. So its sort of serial,
but the render thread is able to do things
atomically, like the circular reveal animations, as well as
the ripple animations, as well as vector drawable animations-- can happen atomically
on the render thread. So that something
that can happen without stalling the UI thread. And in the meantime,
the UI thread can be doing other
things while it's idle after it syncs, like maybe
some of the idle prefetch work for RecyclerView
that was done last year. So render thread kicks in. We've synced everything. We have the DisplayList,
we have the damage area, and then we turn the
DisplayList into something that we call DLOps-- display list operations. So you can see that we have that
fill operation in the middle. That's the thing that
we turned green there. And then we have some
optimizations that we perform. ROMAIN GUY: So we have various
optimizations we do here. So for instance, if
you do Alpha rendering by calling set Alpha on the
view or if you set a hardware layer on the view, we try to
identify the drawing commands that need to be targeted
to those layers, and we move them at the
beginning of the frame. This avoids state
changes inside the GPU, which are extremely expensive. So without doing this
kind of optimization, you would see horrible,
horrible performance. And it's not because the
GPU itself will be slow. The GPU would just be
waiting for the CPU to give it instructions. The other one we're
doing, and we're going to show you an
actual practical example, is called reordering
and matching. We look at all those
operations, and you can see in this example,
because we have list items, we interleave a lot of
operations that are similar. So we're going to
draw a rectangle, and then we're
going to draw text, and then we're going to draw
a rectangle and text again. And again, here, we're
changing the state of the GPU several times, but
instead, what we can do is if the
commands don't overlap, we can draw all the
rectangles together, and then we can draw
all the text together. So this is part of the
reordering and matching. And sometimes, what
we do-- we say, look, if we see a bunch
of text that uses the same color
and the same font, they don't have to be
different draw text calls. They can be just a single one
that covers the entire screen. CHET HAASE: So you can see
here, the original DLOps had a fill operation,
and then it wanted to draw
some text, which is going to end up
being texture map copies from the glyph cache. Then it's got a fill
operation, and then some more text, and a fill. And so we've got all these
operations interleaved. So after the
reordering operation, then it looks a
little more like this. We've got a series of fills and
a series of text operations. They can even be
batched together to be more optimal,
which we will see here. ROMAIN GUY: So this is
an example of Gmail. So that was in
the Honeycomb era. You can see how here,
we modified the pipeline to slow down the
rendering, and you can see exactly how Gmail was drawing. So we have a lot of
list items, and we just draw them in the
exact order that they exist in your view hierarchy
and in the order of your code, actually. All the draw calls that
you make on the Canvas will respect that order. Unfortunately, like I said,
it's very inefficient. So instead, after batching
and merging and reordering, we get this. You can see, in particular,
that all the stars were drawn at the same time, and most
of the text appear all at once. What's interesting is we
are drawing all the list item backgrounds
one after the other. So that's good. The reordering worked. The batching didn't work, and
it's partly because the list items are slightly overlapping. So when commands overlap,
we have to draw them one after the other to
honor the blending, to make sure the Alpha
values are correct. And so the effect really
depends on the application. If I remember
correctly, in KitKat, the settings application was
we could draw the whole screen in about six draw calls,
instead of dozens and dozens, as seen by the View Hierarchy. So this is a very important
optimization for us. CHET HAASE: I think this work
at the time on current devices saved something like
a millisecond, which doesn't sound like much,
unless you realize that we have to do everything within 16. So it was actually
a huge improvement that allowed Gmail to be
less janky, because then it wasn't pushed out into
the next frame as often. So back to our
explanation of everything. Then we have the clipReject. So this is where we feed in the
information about the damage area, right? So we know where item
two was on the screen, and we know that we
don't need to draw anything outside of that. So as we're processing
these DLOps, we know that we can
basically throw away anything that is drawing
outside of that area. In graphics, that's
called a trivial reject. So we trivially
reject all the DLOps that weren't intersecting
with that area, and now all we have
to do is draw a fill and some text and a line. So we do that. In the process of doing
that, we can do a GetBuffer. This is usually an
implicit operation. We don't request a buffer. It's more that as soon as we
start doing GPU operations, then the GPU hands us the
buffer-- more particularly, SurfaceFlinger hands
us the buffer-- that we can then put
these commands into. Then we issue the commands. This is a series of GL commands. As you can see on the
slide, it says glCommand. Basically, the equivalent of
what we need for whatever-- doing the fill or the text-- Bitmap copies lines, whatever. And then we swap the buffer. So this is us saying, we are
done with all of our rendering operations. We're ready to display
this frame on the screen. It's a request to SurfaceFlinger
to then swap the buffer. Basically, we're done
drawing to the buffer. You can swap this with
the one that's in front, and it'll put it on the screen. Meanwhile, in
SurfaceFlinger, then we have the composite step,
which Romain is going to talk a lot about later. But basically, it takes all
of the windows on the screen. We see here the navigation
bar, the status bar, and the actual content
window for our application. It combines all of those
in the hardware compositor, puts them on screen,
and then, tada. We're done. So that was a really
simple example. Let's look at a super
complicated example. This one is going
to be in two phases. One, so we're going
to drag the list up. So we're going to drag it,
and then as we drag it, we're going to move
the items a little bit. And then eventually,
if we keep moving them, we're going to have
a new item appear. So we're going to look
at two versions of this. There's the move only version. So as we drag it
up, a new item-- not a new item appears. Everything just shifts
a little bit up. So first of all, we need
to process the down. So we have a Vsync. It says, time to
process input event. So we do that, and we
end up in code like this. On touch event in
RecyclerView, it says, well, there
was a down operation. And all it needs
to do is register where that down happened. It doesn't need to
process anything. Nothing changed on the screen. We just registered that the
user actually pressed down. So we record that for
later, and there's no op. We don't do any of the rest
of the stuff we talked about, because nothing changed. They keep dragging, and then
we end up in similar code. So we process input
on the next frame, and we say, OK, on touch event. Oh, now we know that
they've actually moved, and we know how much they moved
because we saved the old X and Y. We calculate the delta. And now we call this thing
called offset top and bottom. Basically, for all the
views on the screen, we simply move them in Y. And
offset top and bottom calls something-- it's an
invalidation method, but it's slightly different. It says, invalidateViewProperty. This is an optimization
that we put in probably Honeycomb
second release or something with DisplayList properties. So when I talked about
DisplayList earlier, there was one nuance
that I left out. We have the information
about the operations and the parameters for
the graphics operations. But we also have the information
about some core display properties, which are basically
properties of the view, like the translation
property, rotation alpha. And these are properties
which we don't need to rerender the view to change. We can simply change them in the
DisplayList structure, itself, and then they get picked
up at GPU issue time. So it's a very fast
operation for us to do that. So instead of
invalidating a view and redrawing everything
in that view all we do is say, change
the translation property in this view. So the way that happens is we
call invalidateViewProperty. That propagates all
the way up the tree, because we still need to know,
at the top layer, what happens, but it's a much
more optimal step. So this ends up in
scheduleTraversals, as it did before. In the draw, it ends up
in performTraversals, but PerformDraw can do a
much simpler version of this because the DisplayList
didn't actually change. All we did was change
DisplayList properties inside of it. So we can immediately
sync that information over to the render thread. We can then execute that, turn
that into DisplayList ops, get the buffer. Basically, everything
is as before. Let's go through
the second phase of that super
complicated example. User keeps dragging,
and as they drag, a new item appears
on the bottom. So Vsync, we process the input. We end up in a method
something like this. We know that they've moved. Oh, but that means that we
need to trigger the creation and the bind of
that new item there. That ends up in code like
this, where we actually add a view to the parent. So the RecyclerView is
going to get a new view, and it's going to
call RequestLayout. So RequestLayout is kind
of like invalidation, but instead of saying,
I need to be redrawn, it says, I need to be
remeasured and relaid out. And that could side
affect everybody there, so we basically propagate
a RequestLayout all the way up the tree, like invalidation. And then we do measure and
layout on the entire tree, just to see what changed there. So RequestLayout
happens on the parent, and then that propagates
all the way up. And that ends up, again, in
scheduleTraversals, our friend. And then performTraversals--
we're not talking about the draw stuff now. We're going to do
a performMeasure and a performLayout. Measure is basically
asking all of the views how big they would like to be. It's a request. And then layout says, this is
how big you're going to be, and this is where you're
going to be positioned. It's a negotiation between the
views and all of their parents, according to all the
constraints in the system. So we do a performMeasure. That basically calls
measure at the top, and that propagates
all the way down. And then we have
all the information about how big all of
the views want to be, and that is good enough for
us to calculate the layout information. Then we propagate layout
all the way down the tree. And once that happens on
the item and the parent that changed, then we actually
lay out that item, and we're ready to go. Now, we can actually
draw things, and everything is as before. So the nuance here was
just the layout side, except an important nuance
is we're talking about all of this RequestLayout and
measure and layout happening for this RecyclerView situation. However, RecyclerView
optimizes this. It knows enough about its
own parent and its children that it actually can
just offset the views. Instead of doing
the RequestLayout, it can actually just move
the views out of the way and create the new item. So optimization
for RecyclerView, as well as the older list view. ROMAIN GUY: So now
we're going to talk about how SurfaceFlinger,
our window compositor, composites all the
windows on the screen. This is interesting,
well, first of all, because it's always
interesting to learn something new about technology, but also
because you will understand some concepts that
are behind some of the public APIs like
Surface, SurfaceTexture, SurfaceView, or the MediaCodec. So before we can
understand composition, we have to understand a very
important concept on Android, called the buffer queue. So the buffer queue,
as the name suggests, is just a queue of buffers
where our graphics buffers live. Typically, we're going to
have one to three buffers. There are different
options, internally, where when we set
up a buffer queue, we can request how
many buffers we want. And very importantly, a buffer
queue has two endpoints. We have the producer and
we have the consumer. So typically, the way
we use a buffer queue-- the producer calls a
method called dequeueBuffer on the queue. It grabs a buffer
from the queue. Now it owns it. It can do any kind of rendering. It can be sending the
pixel data directly, it can be using OpenGL, it
can be using the Canvas. It doesn't really matter. And when you use OpenGL,
this is basically what happens when you call
sled buffers at the end. That's when we're producing
the content inside the buffer. So when the producer is
done producing the content, it calls queue buffer,
and gives the buffer back to the buffer queue. Now, a consumer can
grab the next buffer in the queue using acquire. So it calls acquire
buffer, it takes the first available
buffer in the queue, it does whatever it has to do
with it, and when it's done, it puts it back by
calling release. So it's a pretty simple concept. Of course, if you were
to look at the code-- all the header files
and all the code involved-- it's pretty
complicated, in part because the two endpoints
of a buffer queue can live in different processes. And this is exactly
what happens. This is how our surface
compositor works. So when you create a
window in the system, you have the Window Manager
and you have SurfaceFlinger. So the Window Manager
is effectively the producer in this
scenario, and SurfaceFlinger is our consumer. So when you call Window
Manager in that addView-- and this is what's done
automatically for you when you create a dialogue, when
you create a toast, I believe, when you create an activity-- internally, we create
a window object. That window object has a sibling
on the SurfaceFlinger side, called a Layer. The names can be
a little confusing because in graphics, we have to
deal with buffers and queues, and that's all we do, and
we quickly run out of names. And the graphics team-- I thought of-- so
that's why we have Surface and SurfaceTextures
and buffer queue and layer and window. Yeah. So it's a little bit messy. So we have a layer
in SurfaceFlinger. It's basically a window. And the layer is the
component in the system that creates and owns the buffer
queue for your application. So it creates a
buffer queue, and we have a way to send the
endpoint to your application by creating a Surface. So whenever you see a
Surface in one of our APIs, you really have the producer
endpoint of a buffer queue that lives somewhere else in the
system, either in your process or in some other process. Most of the time, it's going
to be inside SurfaceFlinger. So now, the typical
use case for you, as an application
developer, you're going to deal with
the Surface API when you create the SurfaceView. So the way SurfaceView works is
your window as its own Surface that we see here. Then we cut a hole through
that Surface, effectively. And we ask the Window
Manager and SurfaceFlinger to create a second Surface. And we just slide it
underneath, and we pretend that they are
part of the same window. But they are not. They are two different Surfaces. They have two different
buffer queues, and they can be completely
independent from one another. So if you use a
SurfaceView, you're probably going to use OpenGL
or Vulcan or a media player to generate content. So for instance, in this
case, we have OpenGL ES. It's going to dequeueBuffer. It's going to do some rendering. It's going to queue the
buffer back into the Surface, and therefore, into
the buffer queue. If you use a Surface texture,
your consumer will be OpenGL. So you create a SurfaceTexture
by giving it a texture ID. In that case, the
SurfaceTexture creates and owns the buffer queue, so that will
often be in your own process. Then you have to pass
that SurfaceTexture to some producer,
and to do this, you create the
endpoints yourself, by creating a Surface. There's a constructor of Surface
that takes a SurfaceTexture. So you create your
Surface, you send it to some other
application, and then when your OpenGL code
is ready to render, it calls acquire to get a
buffer from the buffer queue. It does its rendering. It's going to, itself, produce
a buffer inside another queue, and then when it's done,
it can call release. TextureView is the widget
that's part of a UI toolkit that you can use to benefit
from SurfaceTexture. In that particular
case, the render thread that we talked about is the
consumer of the SurfaceTexture, and you're still responsible
for getting the Surface from the TextureView
and giving it to a producer of your choosing. You can think of it as a
fancy image, effectively. It's an image view that you
can update really efficiently, using hardware acceleration. In recent years--
we used to tell you that TextureView was
the solution, instead of SurfaceView, when you wanted
to integrate a video or OpenGL rendering inside a
complex application. For instance, if you had
the ListView or CountView or anything that was animated. SurfaceView, because it's
made of two different windows, was naturally
efficient for that, and it was not synchronized
with the rendering of your own application. This has been fixed in the
recent versions of Android. So most of the time, on
recent versions of Android, you should use a SurfaceView
instead of the TextureView. Use the TextureView
only when you have to. Maybe it's sandwiched
between your other views, or use an animation that's not
supported by the SurfaceView. CHET HAASE: I think
that was the O release. ROMAIN GUY: That
was the O-- maybe N. CHET HAASE: Maybe. ROMAIN GUY: One of the two. You have to test. CHET HAASE: That's
why we say recent. ROMAIN GUY: And here's a
list of different producers and consumers in the platform. So we looked at SurfaceView
and SurfaceTexture. OpenGL ES is a producer. It can also be a consumer. So remember when Chet was
saying that in that lifecycle of a frame, at some
point, we get a buffer. That's when we call
dequeueBuffer from the render thread, and this
is done, typically, when we do the first draw call. And at the end, when
we call eglSwapBuffer to tell the driver that
we're done with our frame, it will actually
produce the frame and put it back in
the buffer queue. You can also use things like
Vulcan, the MediaPlayer, and MediaCodec. And we have many, many more
throughout the platform. Now, the actual composition. So we have created
multiple windows that each have their own layer. SurfaceFlinger knows
about all those layers, and to talk to the display,
SurfaceFlinger actually talks to something called
the Hardware Composer. Hardware Composer is a
hardware abstraction layer that we use because we
want to avoid using the GPU and we need to composite
all those windows on screen. One of the reasons
is to save battery. It's more power
efficient that way. But it's also to make sure that
your application has access to basically all the
capabilities of the GPU. We don't take it away from you. And in the past,
you may have heard that you should limit
the number of windows you put on the screen. And you'll see why
in a few slides. So we have the
Hardware Composer. It's effectively a
block of hardware that's really fast at
taking multiple bitmaps and composing them
together on screen. And we just talked about this. So the way it actually
works-- the Hardware Composer is really a kind of a protocol. Here, I'm going to describe
the older Hardware Composer. It's called Hardware Composer
1 or Hardware Composer 0. I'm always confused now. We use one called the
Hardware Composer 2, but it's much more
complicated so I'm not going to describe it here. But the gist of it is it
basically works the same way. So SurfaceFlinger has
a bunch of layers, and we're going to call prepare
on the Hardware Composer. And we're going to send older
layers to the Hardware Composer and ask it to tell us what it
wants to do with every layer. Every Hardware Composer
is a proprietary piece of hardware in the phone
or tablet you're using, and there is no way
that we can write the drivers for all the
different Hardware Composers out there. So instead, the
Hardware Composer's going to tell us
what he wants to do. So in this case,
we have a layer. The Hardware Composer
replies, overlay. That means the Hardware Composer
understands the pixel format of that window and tells
us that it can handle it and is going to do the
composition for that window. So we keep going. It's telling us overlay
for the second layer. It's telling us overlay
for the third layer. So that's great. That means all the composition
can be done automatically for us on our behalf in
a very efficient way. So now, all our leaders
are matched to overlays. We call set. We send all the layers to the
hardware composer, this time for actual composition,
and the hardware composer sends everything to the screen. Now a more complex example. So we have the number of layers. We call prepare. For the first one,
everything goes fine. The hardware composer
says, overlay. It can handle it. But for some reason,
for the next one, it says, frame buffer. So that can happen when you're
using a pixel format that's not supported, or it could
be because you're using a rotation, and the
hardware composer doesn't know how to handle
rotation or we have too many layers on the screen,
or any number of reasons that are specific to
that hardware composer. CHET HAASE: This was much
more common in devices probably three to four
or longer years ago. ROMAIN GUY: Right. We used to have about
four hardware layers that you could use. That's four. That's five. That's four. So we used to have
four, and on Pixel 2, without going into
too much detail, you have basically seven. So it's much better
than it used to be. But if you have a Pixel 2 XL,
we use two of those layers to draw the rounded corners. So you don't really have
seven; you have five. It's actually more six,
because they can be merged by the Hardware Composer. Anyway, lots of details that
can be really complicated. You don't need to know
about all those details. Anyway, in this case,
we have one layer that can go directly to
the Hardware Composer, but we have two layers that
have been marked frame buffer. And that's where the
hard part starts for us, because when we have layers that
are not handled by the Hardware Composer, we need to use the
GPU to composite them ourselves. So SurfaceFlinger has to
be able to do everything that the hardware can do. And in that situation, we
need to create, basically, a scratch buffer-- another layer-- in a format that
we know the Hardware Composer can accept. And then we use
custom OpenGL code to do the composition
ourselves of those two layers. So then, once we're
done with that part, we're left with only
two layers, and we know that those can sent
to the Hardware Composer. So that's what we do. We call set, and then
they show up on screen. CHET HAASE: So if
you're curious sometime, you can run this command, adb
shell dumpsys SurfaceFlinger. ROMAIN GUY: Capital
S, capital F. CHET HAASE: Very important. And it'll spit out way, way
more information than you want. But one of the things that
it's going to show you is a table of the
windows on the screen and whether they are
currently represented as overlays or frame buffer. ROMAIN GUY: Although you have
to run this command pretty quickly because there are tons
of optimizations internally. So what can happen
sometimes is if layers have been on the
screen for a while, and we know they
are not changing, their Hardware Composer
might be collapsing them into a single layer
until they change again. So the output of that command
is sometimes a little bit misleading, because you
might be seeing the result of optimizations based on time. So the best thing to do
is usually to run this when you're running an
animation or something is changing on screen. That's going to be the most
valuable information for you. So a few other things that
we haven't talked about. We used to tell you to use the
variant of invalidate that text in rectangles. So you could invalidate
only part of your view that you knew needed
to be repainted. That was particularly important
with older Android devices, because bandwidth was
extremely limited. We were using
software rendering. And even in the early days
of GPU rendering for us, we were pretty easily
maxing out the GPU. So those were really
important savings. You don't need to
do this any more, and even on recent
versions of Android, before this was
deprecated, it was actually ignored by the system. Now what happens, every time
you go invalidate or invalidate with a rectangle on the
view, the render thread's going to rerender
that whole view or recompute that
sole damaged area. You don't have to
worry about it. And one of the reasons
we're getting rid of it is not only is it not that
necessary anymore for savings, but it's also because
it's error-prone. And it's really easy to have
an off-by-one error or rounding error and to get
artifacts on screen. And Chet and I
can attest to that because we fixed
so many bugs linked to the use of those APIs. And the framework itself
still has bugs, I believe, around that. So now, you don't have
to worry about it. RecyclerView is now
able to do prefetching of items ahead of time. CHET HAASE: Yeah, we
mentioned this earlier. This is one of the wins
that we're now getting out of having a separate render
thread, because now, well, there's idle time. The UI thread was done with
its work after its synced. Well, it can use that
idle time productively for doing other things
like fetching things that it knows it might need
in the next few frames. ROMAIN GUY: For
composition, internally, we have a concept of
the actual display. This is an API you can use. For instance, that's what we
use when we take a screenshot or when we record a video
or when we do casting-- Chromecasting, for instance. What we're effectively doing
is asking SurfaceFlinger to perform a composition, but
not to display directly-- just into another surface. So it's another way of
producing a surface. If you're interested, there's
an excellent sample application called Graphika with a
K available on GitHub. It was written by a
member of the graphics team a few years ago. It's basically a collection
of all the things you can do with
SurfaceFlinger, with Surfaces, with SurfaceView, with the media
encoder, with virtual displays. It's a very interesting
piece of code to look at. Color transforms. So in Android O, we
introduced color management, and that's one of the color
transforms we can apply. Things like nightlights
are also color transforms. We also have
colorblindness simulation. Those can be handled
by the Hardware Composer in specific
situations, and they can be the cause for
performance issues. For instance, a while
back, nightlights was not supported, I believe,
on the N5X or the N6B. One of the reasons
was the hardware. We didn't have the drivers that
could do the color transform, so we had to fall back
to GPU composition. It was really expensive. It was hurting the
battery, so we didn't have the feature on the device. And like Chet said, there
are many, many more details about our rendering pipeline. This was just a very
high level overview. We gave a number of
talks in the past that explained in
more details what we do, for instance, in
the UI render itself, how we do the
batching and merging-- that kind of optimization. So if you're interested,
you can refer to those. CHET HAASE: Shadow
calculation is interesting. I think we had a talk
where we talked about some of those details. But exactly where does that
fit in the rendering pipeline is sort of clever. But yes. Lots more going on there,
but hopefully, this gives you a general sense of
how things work on Android. ROMAIN GUY: And with
that, we're done. CHET HAASE: So
we'll stop it there. Thank you. [APPLAUSE] [MUSIC PLAYING]