JOLANDA VERHOEF: Adding
custom gestures to your app helps bring your app's look
and feel to the next level. In this video, I'll
show you how to create beautiful, rich interactions. After this video, you'll be
able to reason about and easily implement custom gestures. But if we want to talk about
these rich interactions, we need to make sure we have
some common terminology. First, Android
nowadays isn't just phones with touch
interaction anymore. To be inclusive with other
form factors and input types, we have to come up with a more
general term than "touch." In Compose, we use
a term, "pointer," to refer to any
type of input that is used to point at
things on your screen. With a pointer, a user
can perform a gesture. Using the phone
with touch concept, a gesture would consist of
the user putting their finger on the screen, then optionally
moving around a little bit, and finally lifting
their finger again. In Compose, on a low
level, such a gesture is represented by a
stream of pointer events. So for our gesture, we
first see a press event, then a whole lot of move
events, and finally, a release when the user lifts
their finger off the screen. In total, the sum
of all those events between pointer down and pointer
up is what we call a gesture. Of course, the user can also
use more than one finger to perform a gesture. In this case, the pointers
are performing a pinch gesture by moving outwards and inwards. Probably, the user will
not put their fingers down at exactly the same time. So you'll first get a
press and some move events for one pointer, then
get the second press, a bunch of movements for
both pointers simultaneously, and finally, those
release events. Now, recognizing these
events as a pinch gesture is not very trivial, which
is why Compose includes several helper methods. And this brings
us to our toolbox. What helpers can we use to
implement gesture handling in our app? On the lowest level, we can
listen for and directly handle raw pointer events. This is done using the
pointer input modifier. It gives you the
most flexibility to implement exactly the
gesture recognizer you want. For some common gestures,
such as dragging, tapping, and zooming, the
pointer input modifier contains a set of
gesture recognizers. These recognizers translate
raw pointer events to full gestures. One level up, gesture
modifiers, such as clickable and draggable, allow you
to add gesture handling to arbitrary composables. These also contain
extra functionality, which we'll get to in a bit. And finally, some components,
such as button and slider, include gesture recognition
right out of the box. In this video, I will
focus on the middle two layers, gesture recognizers
and gesture modifiers. Let's start with the
gesture recognizers. To create a gesture
recognizer, you first apply the pointer input
modifier to your composable. You pass a lambda
to this modifier. And inside this lambda, you are
in a so-called pointer input scope. This scope provides the
various gesture recognizers. For example, you can use
the detectTapGestures method to recognize various step
gestures-- double tap, long press, press, and tap. You pass the lambdas
for the tap gestures that you want to
respond to, and they will be executed when that
gesture is recognized. Let's add this method
to our toolbox. You can also detect
drag gestures. During a drag, the onDrag lambda
will be continuously executed. You can limit your
drag detector to only respond to vertical dragging,
or to horizontal dragging, or only detect drags after
the user long presses. And you can detect multi-touch
transformation gestures, such as panning, pinching,
and rotating by using detectTransformGestures. This makes a full list
of all the gesture recognizers that are available
in the pointer input scope. If your requirements are
more complicated than this, you would need to fall back to
handling raw pointer events. So that completes our list
of gesture recognizers. Let's continue with
the gesture modifiers. These gesture modifiers
can be applied directly to a composable without needing
to use the pointer input modifier. Besides this more
concise way of writing, they give us some handy
extra functionality besides just
gesture recognition. And we'll look into that later. First, the clickable
modifier adds click behavior. This is similar to the
detectTapGestures recognizer, but it only responds
to single taps. Again, you pass a
lambda that is called when the composable is clicked. If you want to react not
just to taps but also to long presses or
double taps, you can use the combined
clickable modifier instead. Again, you just have to
implement the callbacks that you're interested in. You can use the
draggable modifier to listen to horizontal
or vertical drag gestures. In this case, instead of
passing a single onDrag lambda, you pass a state. This is a common pattern that
allows you to hoist the state and mutate it outside
of the modifier. The scrollable modifier
works the same, but it includes logic for
scrolling and flinging. And the transformable
modifier makes it possible to listen to
multi-touch transform events on a composable. We filled our toolbox
up quite nicely. We have a set of custom
gesture recognizers and a set of modifiers
that we can use directly. So how do we choose
between them? For example, we see both a
clickable and a detect step gestures. And both of them
recognize steps. However, the modifier
does much more than that. Let's quickly look
at its source code. Well, if you look at
the implementation of the clickable modifier, you
can see that it not only adds gesture recognition. It also adds click semantics
to deal with accessibility, key detection, and
focus information to support keyboards,
and an indication modifier to show ripples on
top of the clicked element. This tells us that we
should consider these as well when we drop down
from modifiers, like clickable and draggable, to
use the pointer input modifier directly. This is a common pattern. When you use the various
detect methods on the left, you're recognizing gestures. If you use a high level
modifier, such as clickable or draggable, they
include support for other types
of input as well. In general, try to use
the gesture modifiers and use the gesture
recognizers only if there are good reasons to. We'll see some of these
reasons later in this talk. Now, one other thing
to keep in mind is that these modifiers
exist to recognize gestures. And they don't actually
transform anything. So if we, for example, look
at the transformable modifier, we see that it gets continuous
updates of zoom, offset, and rotation changes. Let's say that we want
to use this modifier to create an element that we can
move around with two fingers. Just having the
modifier is not enough. We need to apply the modifier
to the box composable. This is the part where
we recognize the gesture. Then we need to hold the
current scale rotation and offset values. In this case, we keep
track of the current scale, the current rotation angle,
and the current offset. And we also need to pass logical
initial values for these. From our gesture recognizer,
we update these values whenever the gesture
receives a new change. And finally, we apply
the transformation values to our composable. In this case, I'm using
a graphics layer modifier to set this composable scale,
translation, and rotation. So to repeat, we recognize,
hold, and then apply the data coming in through a gesture. Decoupling the recognition
from the application also means that a gesture
recognizer does not need to be applied
to the composable that you're transforming. Instead, you can
move modifiers around to recognize a
gesture in a parent, but then apply it to
a child composable. In this case, the user
can perform a gesture on the parent box,
but the transformation happens on the inner blue box. A real world example of
this is the material slider. The gesture recognizer is
defined on the full slider component, while the
transformation only happens to its thumb handler. This way I don't actually have
to grab the thumb handler, but I can start the
drag from anywhere. So now that we have a basic
understanding of the tools we have at our disposal,
let's go and create some fancy interactions. We'll be creating this
photo grid sample app. There's a lot happening here,
and we'll take it step by step. And yes, before you ask, the
code is available in a gist. So what requirements do
we have to implement? First, we tap a photo
to open it full screen. Then we can double tap
it to zoom in or out. And alternatively,
we can pinch to zoom. Now, while we are
zoomed in, we can drag to move around this photo. And we can tap the gray
background outside of the photo to exit our full screen mode. Let's implement
these one by one. For the first
requirement, we want to tap the image to
open it full screen. We can choose between the
detectTapGestures recognizer and the clickable modifier. Both will do the trick. But as we discussed before,
the clickable modifier will add some extra
functionality. So we'll go with that. Let's implement our code. We start by implementing
our PhotoGrid. We create a composable, and
pass it a list of photos. We create a lazy vertical
grid with cell sizes of at least 128 dp. Inside, we add all the
photos as items with their ID as the unique key for that item. Each item is then represented
with a PhotoItem composable that we'll implement later. With this, we actually already
have some gesture handling in our app. Lazy vertical grid comes
with scroll support right out of the box. This is an example
of a composable that includes support for gestures. Now let's implement
our first requirements. Clicking one of the
photos should open it in full screen mode. We first wrap our photo grid
in a top level app composable. We add some states. We keep track of the photo
ID that should currently be showing full screen. When we start the app,
this should be null because no photo
is selected yet. If this activeId is not null--
that is a photo has been clicked-- we show the full screen photo. We look up the photo that
belongs to this identifier, and we pass it to
the full screen photo composable that will be
responsible for showing this full screen photo. Finally, we add the callbacks
to update the activeId. We pass and navigate
the photo lambda to the photo grid
composable, and then the onDismiss lambda to the
full screen photo composable. Going back to our
photo grid composable, this now gets this
navigateToPhoto lambda. We add a clickable
modifier to the PhotoItem, and we call that
navigate lambda when the user clicks this element. And with that, we finish
our first requirements. The clickable
modifier can be used to make any
composable clickable, including our photo in the grid. Next, while we're showing
the full screen photo, a double tap should
zoom in or out. So how would that work exactly? When the user double taps,
we remember the tap location on screen. We then scale the image around
that point, while making sure it stays within the bounds. To implement this requirement,
we again have two options-- detectTapGestures and
combinedClickable. Let's implement our code
and see which one fits. For that, we first need to learn
more about the FullScreenPhoto composable. It gets a photo, the
onDismiss lambda that should be called to exit
the full screen mode, and an optional modifier. Inside, we define a box
that fills the whole parent and centers its contents. In the box, we'll have
a Scrim and the image. The Scrim will basically be
a semi-transparent background that we can click to
exit full screen mode. We'll get to this later. Right now, the photo
image composable is the one we're interested
in for zooming its content. So let's look at
this photo image. It contains an image
that we initialize with the URL of the photo. We pass it the photo's
content description and set its aspect ratio to
make sure that it's square. Let's add some variables
to hold our state. We'll keep track of the
offset and the zoom level. The offset is used so we can
zoom in on a specific part of the image. We can then use a graphics layer
modifier to apply these values to the image. We use the offset
and zoom values to translate and
scale the composable and set the transform origin to
make some of these calculations easier. And we also add a clip
modifier to make sure that the image stays
within its bounds, even when we scale it
up or move it around. Now we're ready to add
the gesture recognizer. We might be tempted to use the
combinedClickable modifier. However, the onDoubleClick does
not contain any information about the location on the screen
where the double-click happens. So we can't use it to
update our offset value. Instead, we can revert to the
detectTapGestures recognizer, which passes the tapOffset as
a parameter to the onDoubleTap lambda. We use that tapOffset to
calculate our new zoom and offset values. I'll leave the actual
calculation of the offset out for now because it's a
little bit difficult to wrap your head around,
but rest assured, it's just a one liner
doing some calculations. Great. That finishes our double
tap to zoom functionality. Now, you do remember that I said
earlier that this leaves out some functionality. And we will get to that later. Don't worry. Next, let's cover the pinch
to zoom and drag to pan. For these requirements, we
should recognize all sorts of movements, both with
two pointers for zooming and with a single
pointer for the panning. We again have two options-- detectTransformGestures and
the transformable modifier. Let's first see what
behavior we want. While the user
pinches to zoom, we want the image to zoom around
the middle of the pinch gesture. This middle, represented
by this orange dot, is called the centroid. The transformable modifier
does not get this centroid. The detect transform
gesture method does. So let's choose that one. Going back to our code,
our hold and apply blocks are still identical. We still want to change
the offset and zoom values, and we still want to apply
them to our image composable. We just have to add
another gesture recognizer. We add another pointer input
modifier, this time detecting transformation gestures. This will continuously
call the lambda with delta values for
movement and scaling. We calculate the new offset
value and zoom value. I'll leave out the actual
calculation for now. But again, it's just
a few lines of code that you can find in the gist. And with that, we have a
zoomable and pannable image using detectTransformGestures. Now we only need to allow the
user to exit the full screen image by tapping the screen. Going back to our
full screen photo, remember that Scrim composable. If we look at its
definition, we can see that it currently
is simply a full screen box with a grayish color. We can add behavior to it
by passing an onClose lambda and calling that lambda
when the user taps the box. In the FullScreenPhoto,
we can simply forward the existing
onDismiss parameter to this Scrim composable. And with that, we finish
our list of requirements. We can tap to open,
double tap to zoom. We can pinch to
zoom, drag to pan, and tap the scrim to cancel. But actually, we're not done. There's one more thing
that we should consider. Specifically, I'd like to
talk for a bit about Talkback, an accessibility service
on Android devices that is used by people
with visual impairments. When Talkback is enabled, it
captures all of the gestures that a user makes. So any of those custom
gestures that you built will not be available
to the user. Instead, Talkback
handles all gestures. For example, a swipe to
right anywhere on the screen will move the focus
to the next element. A double tap behaves
like a click. And a two-finger swipe creates
scroll events, et cetera. So any gesture recognizers that
you apply to your composables won't work out of the box. Instead, you have to
configure the composable to explain to Talkback what it
is you're trying to accomplish with this gesture recognizer. Let's check our requirements. The clickable
modifier that we use to open photos full screen
adds the necessary information. But the double tap
to zoom, the pinch and drag to zoom, and pan,
and tap to cancel full screen won't work out of the box. Now, I won't go into
detail on how exactly to add this behavior, but
check out the documentation on accessibility and Compose. That explains it in detail. And with this, let's move
on to our next gesture. Our goal is to be able to select
multiple photos in the grid. A user should be
able to long-press and drag to select
multiple photos at once. Let's break this behavior up
into separate requirements. First, we want the user to
be able to enter selection mode by long-pressing a photo. Then tapping any photo should
add them to or remove them from the selection. When the user long-presses
a photo and drags, the photos in should be
selected or deselected. And finally, when the user
removes the last photo from the selection, we should
exit the selection mode. Now, before we
implement the gestures, I'd like to pause for
a second and think about state management. That is, what state should we
keep track of in our composable to correctly render our grids? At any given time, we will
have a set of photo IDs that are selected. When there are no
photos selected, like here on the left,
this set will be empty. On the right, you
can see an example where we have several photos
selected with IDs 4, 5, 6, and 8. We also need to keep track of
whether or not we are currently in selection mode. That is whether
at least one photo is selected at the moment. When selection mode is false,
all photos should show as is. But when selection mode is
true, the left top corner of the photo should
show a radio button, and they should respond
differently to taps. We will keep track of this
information in the PhotoGrid composable. Remember that the PhotoGrid
composable is currently simply showing a
lazy vertical grid with the photo items in it. We add a variable selectedIds
to keep track of all the photos that the user selected. Note that we use
rememberSaveable here so that the variable is
persisted across configuration changes. In addition, we want
a simple Boolean to tell us whether or not we
are currently in selected modes. Since this Boolean
will change only when we move from an empty
list to a list with at least one photo and the
other way around, the right API to use here
would be derivedStateOf. This will make sure that
the composables that rely on the value of the
inSelectionMode variable will only recompose
when the value changes from true to false or back,
and not for every change in the set of selected IDs. In addition, for each
photo in the grid, we can look up whether
that specific photo is selected by
checking if it's part of the set of selected IDs. Again, we're using
derivedStateOf to make sure that we only
recompose composables depending on this state when the
selection state changes from true to false, or
the other way around. We then forward these
properties to the PhotoItem composable so it can adapt
its UI based on their values. And inside the
PhotoItem composable, we use these values to choose
what icon to show, if any. When inSelectionMode is true
and the photo is selected, a check mark is added. If inSelectionMode is true,
but selected is false, an unselected radio
button is shown. And if inSelectionMode is
false, no icon is shown at all. And with that, we
have state management, which is reflected in our UI. Only we're not
changing the state yet. And that, of course, is where
our gesture handling comes in. Remember we want a
long-click on an item to start the selection
mode and directly add that photo to the
set of selected photos. But we also already
had a normal click listener defined on our photo
item to open it full screen. So in our case,
we want to listen to both click and long-press
events on the same composable. Let's go to our toolbox
and see which handlers would fit that requirement. We can choose either
detectTapGestures or combinedClickable. Now, remember that
detectTapGestures only gives us the raw gesture handling,
while combinedClickable includes extras, such as
accessibility and focus support. That sounds great
for our use case. So let's go with
combinedClickable. We go back to our
PhotoGrid composable. In the first part of
this presentation, we applied a clickable modifier
to the PhotoItem composable. Now we want to change that
to this combinedClickable. For the onClick lambda, we pass
the same behavior as before, allowing the user to navigate
to the full screen photo view. When the user long-presses,
we add the ID of the photo to the set of selected IDs. By simply changing the
mutable selectedIds state, any composables that
depend on that state will automatically recompose. Our first requirement is done. We can long-press to
enter selection mode, and we used
combinedClickable for that. However, right now,
if we click a photo while we're in selection
mode, it opens it full screen. That's not the behavior
we're looking for. Instead, if we tap
in selection mode, a photo should be added or
removed from the selection. So let's go back to our
PhotoGrid composable and adapt its gesture behavior. Currently, it always applies
to combinedClickable modifier, navigating onClick, and adding
the photo to the selection onLongClick. We want this behavior to
be different, depending on the inSelectionMode value. We add an if else
statement, and apply a different gesture
handler based on the value of inSelectionMode. If inSelectionMode is true,
we should listen to taps and add or remove the
photo from our selection. So we add a clickable modifier
and let it remove the photo ID from the set when it's currently
selected, or add it to the set when it's currently unselected. We solved our second requirement
with the clickable modifier. We can now enter selection
mode and add or remove photos while we're in selection mode. Next up, we want to
use long-press and drag to select multiple
images at the same time. Let's open up our toolbox and
see which gesture handler fits. We need a handler that
waits for a long-press, and only then passes
on any drag events. There's a detector
that does exactly that. The
detectDragGesturesAfterLongPress method will help us to
implement our multiselect. It waits for a
long-press to happen, and will then forward all
the drag events to a lambda. It also contains lambdas
for when the drag starts, when it's canceled,
or when it ends. For our multiselect case,
when the drag starts, we should add the photo
underneath our pointer to the selection. Then while we drag, when the
pointer moves to a new photo, all photos in between
the old and new photos should be added or removed
from the selection. But which composable do we
apply this gesture handler to? Do we apply it to
the single photo item or do we apply it
to the whole grid? In order to answer
that question, I need to tell you a bit
more about hit testing. Let's take our app and
look at it from an angle. The UI of our app consists
of layers, where our photo items are at the top, the
grid is in the middle, and the main app container is
at the bottom of the stack. When we touch our screen, a
hit detection algorithm runs. You can envision
this as a ray being shot at your UI hierarchy. In this case, it will first
hit one of the photos. Then it will hit the grid, and
then it will hit the main app container. These composables get
the chance to register themselves to receive
pointer input events. Now, as the user drags their
finger over the screen, this imaginary ray shooting at
the screen changes position. But it is important to
understand that it doesn't perform any more hit testing. So even though it exits the
bounds of the middle photo, that photo composable
would still be the one receiving
the drag events. As long as the gesture
is in progress, only the composables
that were originally hit will receive updates. So why is this important to us? Well, our requirement
includes adding and removing photos from our selected
list as we drag over them. Imagine that we would register
the drag listener only on a single photo item. This item does not know
anything of its siblings. So when the pointer
exits its bounds, it will not know which
photo it moved to. Instead, we need to listen to
the gestures on the whole grid. Given the location of the
pointer within the grid, we can calculate which
photo that is pointing at. And then when it moves
out of the bounds, we can recalculate that. So let's go back to our
PhotoGrid composable with its lazy vertical grid. We add the pointer
input modifier. Inside, we call the
detectDragGesturesAfterLongPress method. We have a way to
listen to this gesture, but now we need to actually
respond to it-- in our case, by updating the set
of selected IDs. During our onDragStart,
we want to figure out which photo is originally
selected and keep track of it. We also want to add
this initial photo to the set of selected photos. When the drag is
canceled, or when it ends, we should make sure to reset any
internal state in the modifier so it is ready for
a next gesture. And during the
onDrag callback, we need to find out if the
pointer changed from one photo to the next. And if so, add or
remove the right photos from the selection. To know which photos
to add or remove, we need both the identifier of
the photo that was initially hit by the long-press and
the identifier of the photo that the pointer
has last been over. In the onDragStart
lambda, we get an offset that indicates where in
the bounds of our lazy grid the pointer went down. We can use that
offset to find out what photo the user is pointing
at using a helper method. You can find the implementation
of this photoIdAtOffset on GitHub. The method can also return
null if the user is pointing at the space between photos. So we do a null check on
the result of the method. If a photo is
indeed hit, we check if it wasn't yet
in the selection. If not, we add it and save it
as the initial and current photo ID, and add it to our selection. In onDragCancel
and onDragEnd, we reset this initial
photo ID to null. And in the onDrag lambda, we
handle the actual drag events. This lambda gets a
change event as input. In that change event,
there is a position field that contains the offset
in the composable where the event happens. We can reuse the same
photoIdAtOffset helper method to see which photo
the user points at. Now during the drag,
we are continuously tracking which photo ID
is underneath the pointer. We're only interested in
the movement from one photo to the next. So we check if the
pointer photo ID is different from
the current photo ID. We also update the
current photo ID when we're done
with this update. And finally, we update
the set of selected IDs. This addOrRemoveUpTo method
is part of our business logic, and we'll update the set
of selected IDs based on the pointer, current,
and initial photo ID. Now, we can successfully
long-press to enter selection mode, tap to add or
remove from selection, and long-press plus drag to-- wait. Long-press and
drag doesn't work. Why is that? In order to find our
answer to this problem, we need to talk about
conflict resolution. Now, remember that we solved
our initial requirement with a combinedClickable
modifier, letting it listen to
long-press events. But now we're also detecting
drags after long press. So both the photo items
themselves and the grid are listening for and trying
to react to long-press events. We have a conflict. If we look at the
layered version of our UI again, when a pointer
event comes in, the layer that lies on
top gets the event first. In this case, that's
our photo item. It chooses whether or
not to consume the event. Only if this layer would
not consume the event, it is forwarded to the
next layer, et cetera. In our case, the
long-press recognizer that we added to the
PhotoItem composable captures and
consumes all events. This means that our photo
grid will not get any events, and its long press and drag
recognizer will not work. Looking at the code,
the onLongClick in our combinedClickable
is the culprit. It blocks the long-press
and drag off the grid. But actually, we don't really
need this long-click listener anymore. The long-press and drag
listener of the grid already adds our photo to the
selection after a long press. So we can change the
combinedClickable to the original
clickable modifier. Fixed! Now we can
successfully long-press to enter selection mode,
tap to add or remove from selection, and long-press
and drag to add or remove multiple photos. Now, finally, we want to be able
to tap the last selected photo to exit selection mode. However, this already
works out of the box. Because we made
our selection mode dependent on the
set of selected IDs, deselecting the last
photo automatically exits selection mode. We're done! If you learned something new
about gestures in Compose, then please share this
video or like and subscribe.