[MUSIC PLAYING] JONATHAN GERRISH: Good
morning and welcome. My name's Jonathan, and
I'm an engineer at Google. And welcome to today's session. We're going to talk
about how to build testable apps for Android. So through the history of
time, architectures continually evolved in style and ambition
to fit our expanding visions. In order to build these
increasingly complex architectures, we've
needed to also innovate new tools and
patterns and methods to meet the demands
of building them. And just like in the
real world, in software, too, we've all had our
own evolutionary path. We've seen this in
Android Development, too. Who remembers building smaller
applications in Eclipse? And as the complexity of
our applications grew, so did the need for better
tools and testing tools. And now today, we're building
a quite complex applications within Android Studio. And last year, it
was Jetpack, which included a lot of
great libraries to abstract some of
the details, allowing those developers to focus
on writing great features. But sometimes the evolution
hasn't always been easy. And choices developers
have had to make haven't always been so clear. So how do you answer
questions such as, how should I architect
my application? Or, how do I organize
my code base? What library should I choose? And what tools should I pick? The choices that you make
early on in development have a long-lasting impact on
the testability of your app, and thereby, your development
velocity and your ability to add new features sustainably. Well, today we're
going to show you with some real-world
examples how to make the pragmatic
decisions in how you build your applications in order
to build a long-term testing strategy. In testing, the key attribute
to consider is scope. Now, scope means how much of
my applications in my test covering and test could run
in on just a single method, or they could span multiple
features, multiple screens. And scope directly impacts two
other attributes of testing. Speed. How fast does your test run? And some can take from the order
of milliseconds, and others all the way up to
minutes, or maybe more. And fidelity. How close to
real-world scenarios does your test case simulate? An increasing scope typically
increases the fidelity of your tests. But it does so at the expense
of speed, and vise versa. And you can't actually have
a single test that gives you the best of everything. The question is
when is good enough? When do you really need perfect? And how do you achieve
the right balance? The testing pyramid
is used as a guide to help you create that balance. As you go up the pyramid,
you improve on fidelity by increasing the scope. But remember, this comes at
the price of speed, focus, and scalability. Unit tests-- they've got to
be fast, lightweight, highly focused, in order that they
can achieve high scalability. They're really easy to define,
because most of the time we're just testing a single
method in a single class within our application. And this means
that they're going to give you really
high degree of focus to the origin of a failure. Integration tests are the
next category of test. And here we're trying to
bring together several units within our application. And we're interested in
verifying their collaboration, making sure that when we bring
them together, that they all behave expected as a whole. And end-to-end tests, they
step through key paths in our application, after
uncovering multiple screens and features. And these are also real easy to
define, because at this stage, we know we're testing
our whole application. Today, we're launching
a new to-do application. Well, it's not really an
official Google product. But it is a real application. It's part of the refreshed
Android testing code lab that we're launching today. So you can go ahead,
check it out, build, test, work through all the examples
that you'll see right here in this session today. Now we're going to
work through building this application together. And in doing so,
along the way, we'll discuss some of the
challenges and the choices that will be faced. Building an application
usually starts by defining some key
critical user journeys. And a critical user journey
is a step-by-step path that the user takes
through an application. And the idea is in order to
meet a predefined end goal, the journey may span
multiple screens and decision points to get to that end goal. And they're often sketched
out by a series of mockups. And let's take a look at some
that our UX designers just sent us. Our first user journey is
that of creating a new task. Users arrive on the home screen,
which has a list of tasks. The first time they get
there, it's going to be empty. There's a floating action
button that they can click. It takes them to the next
screen, where they can enter details for their tasks. They can click Save, and they
return back to the home screen. And their new task
should show up. Our second user journey is
about checking our progress. So users can select
an existing task. They can mark it as completed. And then they can go
and view their progress on a statistics
screen that shows them just how productive they are. Now, every project
starts off small. But if careful attention isn't
paid to design, architecture, organization, during the
growth of that code base, development can quickly
spiral out of control as your application
grows uncontrollably. Without any thought,
your code base can quickly turn to a huge
monolith, a spaghetti-like ball of incoherent
dependencies that are not only hard to reason
about but they're difficult to test as well. If individual units don't follow
key principles such as that of high cohesion
and low coupling, they become really difficult
to test alone in isolation. And furthermore, with
a monolithic code base like this, anytime you
make a single change to your application, you
have to rebuild everything. And these factors force
the majority of tests to end up being large
end-to-end tests. How does this
affect our pyramid? Well, with such
resulting chaos, you can see that our pyramid
is now completely disproportional to how
we'd like it to look. If we do try to think about
organization from when we start out, our first thought
might be following a layered architecture. At this stage in
development, it's the only dimension
that is visible to us. And there are also
Android concepts that map neatly to each layer. So maybe this makes sense. And by structuring
our code this way, we can slash dependencies,
follow those principles of high cohesion, low
coupling, maybe introduce dependency injection,
and now we can see that unit tests are possible. But as our application
grows in complexity, we start to notice that it grows
by the dimension of features rather than
architectural layers. So even if we did modulize
our code this way, a small change high
up in the tech stack is only going to save a
couple of layers of modules of rebuild, whereas something
lower down still causes a complete rebuild of
the old application. Furthermore, the layers
themselves become monolithic. And so we still end up writing
so many large end-to-end tests. Now, while the ability to
start writing unit tests is really good,
projects are still left with a pyramid that
doesn't quite look right. And the problem
with this setup is that in order to compensate
for our fidelity gap and unit tests, we're
overcompensating dramatically in end-to-end end tests, which
are slow and heavyweight. There's nothing here that's
guiding us so far in order to make a balanced pyramid. So poorly organized and
architected code base can quickly lead to severe
bottlenecks in your development workflow. By overrelying on these
large end-to-end tests, were faced with test suites
that take far too long to run. And the lack of focus in
them mean the bugs are really hard to track down. Without effective
marginalization, every change we make to the
app causes large swaths of it to be rebuilt, and all
the tests must be rerun. These key points can cripple
your team's velocity. But organizing
your code correctly has a big impact on testability
and development velocity. So we want to get it
right from the get-go. We want to create
a way that's going to be scalable as
we move forward and our application
grows over time. So let's think about how we
may decompose our project. At the top we've
got our application. And one of the key
areas of functionalities is managing tasks. We also have a progress module
that has a dependency on tasks. And as we dive in,
we notice that task is a really big feature. We can further decompose it
with add, edit, list, view. And organizing our code this
way allows our development to scale as our
application grows and new features are added. And we're also able to scale
in depth of complexity, too. As features become
more complex, we can continue breaking them
down, adding more modules. And this approach
to organization makes sense, since two
components in the same domain are much more related in
function than two components that might just happen
to be an activity. We can implement this
kind of organization both through language
features such as packaging but also through our build
system, like Gradle modules or [? basl ?] libraries. We can add domain-orientated
modules now to the application and define clear API
boundaries to contractualize their interactions. So now we have a way to
shard our application, which makes it possible to isolate
the components for more focused testing. Finally, we can see blueprints
for integration tests. And of course, all
these modules are going to be decomposed
and be unit testable. And we can still write
our large end-to-end test. Furthermore, this
organization allows us to scale as we add new
features that test scale along with them. You can use this guide
as a starting point. And of course, you can decompose
further or in different ways that make sense for
your application. The key thing here,
though, is to remember to provide natural
guides and templates for different categories of
tests for your application. To build our to-do
application, we're going to be using some of
the architecture component libraries from Jetpack
such as data binding, view model, live data,
navigation, and room. We're going to follow the model
view model pattern and MVVM to architecture application. This provides a really clear
separation of concerns. And Jetpack's architecture
component libraries really fit in well neatly with this. I'm going to start with
a single activity that uses the navigation component
to map the user's flows through a series of fragments. Each one managing
its own screen. Each fragment has
its own XML layout that's mapped directly
to its own view model, using data binding
architecture component. It will also use live
data to reflect changes back up into the view. And our model layer is
going to be abstracted under a repository that contains
both a remote data source and a local data source
that's backed by SQLite and using room
architecture component. On Android, the user interface
is updated on the UI thread. And so long as
the events that we post there are nice
and short tasks, our UI stays snappy
and responsive. In our application,
however, everything is not going to
fit that criteria. We use both a local database and
we make crests to a remote REST API for dealing with task data. Operations to both of these
components take a long time. And if we were to run
these on the UI thread, we'd quickly see that
our application becomes slow or even unresponsive. So of course, we
need to make sure that these long-running
operations occur asynchronously in the background somehow so
that we're not blocking our UI thread from responding while
we're waiting for these tasks to complete. In our application,
we're going to make use of Kotlin's coroutines
for asynchronous operations. You can think of coroutines
as lightweight threads. And although they've been stable
for only a relatively short amount of time, the community
has adopted them very quickly. And they've become a clear
trend in Android development. A coroutine scope keeps track of
all the coroutines it creates. And if you cancel
a scope, it thereby cancels all of the
coroutines that were created in that scope. In our application,
coroutines are launched from the
view model objects, using a special
view model scope. This is particularly useful when
our view model gets destroyed, because it automatically cancels
all of those existing child coroutines. It's going to save resources and
avoids potential memory leaks along the way. And from within a scope, we
can call down to our tasks repository. The coroutine scope
created in our task repository is concerned for
parallel decomposition of work. When any child coroutine
in this scope fails, the entire scope fails and all
of the remaining coroutines are canceled. This function returns
as soon as its given block and all of the child
coroutines are both complete. Coroutines can certainly make
developing asynchronous code a lot simpler. So let's start by implementing
our first critical user journey. Just to recap, we start
on the home screen. We click a floating
action button, which takes us to the detail screen. Here we enter text for
our new task, save, and we're back to the
home screen, where we can see our newly created note. We're going to develop
our application using test-driven development. And this is a school of
thought where we first codify the specification of
our application in tests, first of all, and only then
do we write the production code in order to satisfy
that specification. We're also going to
do all this top down, starting from the
end-to-end test, and then breaking this down,
and decomposing further and further, until we finally
reach the individual units that are required to satisfy
the feature we're building. So let's start by writing
an end-to-end test. It's going to be failing first,
but we know that by the time we make it pass, our
feature's complete. It's a good signal
for the end state. Let's review some key
qualities of end-to-end tests. The main thing we're
looking for here is that we've got confidence
in the final application when it's finished. Therefore, these
kinds of tests should run on a real or
a virtual device and make sure that our code
interacts with the Android environment as expected. Our application should
also look as close to the final
application as possible that will go into ship. And we should test it
in the very same way that our users are going
to interact with it. This means we're doing
blackbox testing. And here, we don't need to be
exhaustive with all the tests. That's the job of
testing other layers. Now, let's examine
the scope of the code and see what we're going
to exercise in our test. It looks like for
our first test case, the AddEditTasksFragment screen
and the TasksFragment screen are what's important. So for this particular
end-to-end test, we're just going to
discard and ignore task details for the moment. We can start on the home screen
by using activity scenario to launch the task
activity class. Then we can click on
the floating action button, which should take
us to the next screen. And here we can use
Espresso to enter text into the detail screen. And one more time
with Espresso to click the button, which would send
us back to the first screen. And here, make a
simple assertion to check that the newly added
task appears on the home screen. Now we're not using
any special APIs or any hooks or back doors. This is known as
black box testing. And interacting with
the application this way gives us the confidence
that it'll still work if a real user were to
step through the flow in exactly the same way. So now we need to add some
integration tests in order to bridge the gap between
those large end-to-end tests that we've just written
and the smaller, faster, exhaustive unit tests
that we'll be adding later. Here we're looking for something
that gives us a good clue that all of the individual units
that we're bringing together collaborate as planned. That's where the
focus should be here. These tests will be
relatively scalable. And providing enough
coverage at this level means we need to
lean less and less on those large, heavyweight
end-to-end tests. Here it's kind of less
important that we're using all real components. And it's OK to make judicious
use of testing APIs. But what exactly kind
of tests should we be writing at this level? When we introduced scope
earlier in the session, we defined it as the
amount of real code that's exercised by the test. And in the end-to-end
test we've already seen, that scope's pretty large. With integration tests,
it's a little more nuanced. Luckily, our architecture
and code organization leads us straight to
some good candidates. Let's approach this
by decomposition. If the previous
end-to-end tests just focused on the
AddEditTasksFragment screen and the TasksFragment
screen, we already know that this next
integration test has got to be a smaller
scope than that. And looking at our
architecture diagram, I can already see
the first candidate. Let's start by writing
an integration test for the entire tech stack that
supports the AddNewTest screen. So we remove the TaskList
screen from the equation. Do you see any other
candidates here where we might want
to limit their scope? Some of the objects in
the scope of your test might have some undesirable
characteristics. Perhaps one of them is too slow. Maybe it reads a
large file at startup. Perhaps another is a really
heavyweight dependency that takes a long time to build. Perhaps it makes an arbitrary
network connections, causing a test to be flaky. And some dependencies, they just
can't be controlled in the way that we need to simulate
within our tests. In such cases, you
may want to consider replacing that original
dependency with a test double. Test doubles are stand-ins
for the real object. There are several
categories of test doubles. Each of them range in fidelity. Dummies. These are just intended for
stand-in for the real behavior just to satisfy dependencies. Then stubs, which aim to offer
one-off specific behavior. It'll allow you to configure
it for the needs of your test. Either of these
could be hand-rolled or they could be
provided by your mocking library, such as Mockito. Or consider fakes,
which aim to be a more accurate, yet lightweight
substitute for the real thing. And you may be surprised to
still see real objects up here. Sometimes, though,
it makes sense to use real objects
in your tests if it avoids any
of those criteria that we considered
before, and where it makes the test more
readable and robust over the alternative. Value objects are
just one example of why you should always
prefer using a real object. Taking a closer
look, there are now some candidates where
we might want to start increasing removing the scope. We could drive our test
through TasksActivity. But this is concerned with the
navigation between screens. And we don't need to
test this at this level. That's more of an
end-to-end test. So instead, we're going to
reach for FragmentScenario and use Espresso to
test the UI directly. We're going to need switch in a
test double for our navigation controller, however. And we can use this to
verify that our navigation is working as expected. TaskRepository, it presents
a clear and well-defined API to all the layers above. So it's good practice to make
use of this API from tests and to use that to check
to see if our test had saved the task correctly. But look, including
a remote data source, which connects to
an external server, that's going to make
our test slow and flaky. So let's switch that out
also for a test double. So first, we're using
FragmentScenario to launch our fragment. And we need to verify that our
floating action button sends us to the right screen. And the navigation controller
handles this kind of thing. We don't actually need
to go to that new screen for this kind of test. We just need to record
that we went there. So we can swap out the
navigation controller for a test double. There isn't actually a
fake version provided. So in this case, I think it's
perfectly acceptable just to shim in a mock like this. And now we can use Espresso
APIs to enter some text in the fields as we did before,
clicking the floating action button to save the task. And for the final
part of the test, we need to check two things. First, was the task
saved correctly? So we can do this by obtaining
the task service or the task repository from the
service locator. And we can use its
APIs to get a list of the tasks that were saved. And then we can make
sure it contains one that was saved
that matches the one we tried to save through the UI. The next assertion is did we get
back to the right screen, OK? We can check with our
mock navigation controller to make sure that
the right navigation event was sent that
would have directed us to the right screen. And we can decompose further
and look for other ways that we might want
to limit scope in order to create smaller
and smaller integration tests. Let's take TaskRepository,
for example. It represents our model. It's got a well-defined API
that supports all the task UI features, as well as features in
other modules like the progress module. And it's also likely to contain
large amounts of complexity and business value. And it includes a good
deal of collaborators. And this makes it a great
candidate for covering with an integration test. So let's remove all of the UI
from the scope of this test. Now we can proceed to directly
test this well-defined API of our test repository. And here we'll make
similar choices when it comes to fidelity
to our metric principles and speed trade-offs, just
like we did in the last test. We'll keep using a fake to stand
in for the real data source, as well as providing us
with repeatable tests. A fake here allows
us to configure all kinds of test data
sets that we might want to wire up for
certain conditions, testing in different ways. Having a well-defined API
at the model layer also allows us to do something
else that's really cool. What if we take
our TaskRepository and extract away an interface? Now we can create
a fake version. And by running the same
test against the fake that we run against our
production repository, our fake becomes
a verified fake. And what we're doing
is guaranteeing its behavior meets
the same specification as our real production code. And if we create
separate modules for both our APIs and
our fakes, other modules that we interact
with will see faster build times and more
lightweight tests. So here we have a fake
for our model layer that we're confident
in, and we can start to use it in other tests. Coming back to the
first integration test we wrote for the
AddEditTask screen, we could have equally
written this integration test with a fake task repository. We trust our fake because
it's a verified fake. And it's really fast, too. It probably stores its data
in an in-memory hash map. We can apply that
same testing blueprint across all of the other
modules in the tasks UI. These UI modules
are another group of components whose integration
we're really concerned with. We want to be sure that view
models collaborate correctly with our fragments, is
our data binding wired up, are all the possible input
validation cases handled correctly? And unit tests, these
verify the operations are very small units of code. The scope of these kinds of
tests is as small as possible. So the code can be
tested exhaustively and give very fast and very
specific feedback on failures. Our large projects are going
to have thousands of these, so they should run
in milliseconds. It's totally OK to swap out
production dependencies. But they should still
be black box in nature. We want to be testing
behavior, not implementation. And the line between
the categories of tests here can get a little blurry. Let's consider writing a task
for our tasks local data store. TaskLocalDataStore takes
a TaskDao as a dependency. And in a real system, this
is provided by the to-do database-- a class generated by room, which
is backed by Android SQLite. And if we follow the classic
principles of unit testing, we can ask Mockito to provide us
a mock for our TaskDao instead. Here in our test, we
can create that mock and then pass it
in as a dependency to our TaskLocalDataStore. We can create a new task, and
then save it in the repository. And then finally, we can
validate the insert task call was invoked on our TaskDao. But wait-- this test
here already knows too much about
the implementation details of save task,
how it's implemented. If we were ever going to
change that implementation, then the test is
going to need updating as well, even if
the behavior was supposed to remain the same. This is what is known as
a change detector test. And its burdensome maintenance
can start to quickly outgrow its usefulness. Effective unit
tests should really focus on testing
behavior instead. But how should we do that? We can do that by ignoring
the internal implementation and focusing on the
API contracts instead. Take TaskDataSource. The contract states
that when I save a task through the save task
method, I should still then be able to retrieve that same
task by looking it up by ID. So our test should
exercise that contract rather than concerning itself
with implementation details. So we'll exercise
the save task method on our LocalTaskDataStore. But we won't be
concerned with the fact that it calls insert
task on the Dao. Then, we'll call get
task on the data store again, again forgetting
about the implementation. And one thing to bear in mind
when writing tasks like this, where the code on the task
makes use of coroutines, is that we need to make
these asynchronous operations appear synchronous
so that our tasks are going to remain deterministic. If we were to get-- if we were to call a get
task function and execute it, and sometimes the
save task function hadn't completed in time,
we'd end up with a flaky test. Luckily, doing so is rather
straightforward by asking our test to run blocking. One of the first
tools you'll learn to write tests that
uses coroutines is the run blocking construct. In the context of run blocking,
the given suspend function and all of the calls, children
in the call hierarchy, are effectively going
to block the main thread until it finishes executing. And you're going to find this
a really useful tool when exercising code whose
behavior relies on coroutines and needs to
be highly deterministic. So the test we actually
want to look at is going to look
something like this. We create a task, save
it to the data source, then we ask the local data
source to retrieve that task back for us. And finally, we can
make an assertion that we got what we expected. In fact, Google and JetBrains
have just recently collaborated to just launch the run blocking
test coroutine builder. And this makes testing
coroutines even easier. It's currently mocked as an
experimental coroutines API. So please, go and check
it out and give us some feedback on any
bugs that you might find. So in order to write
this test, it's important that our data
store maintain state. And it does this through
its dependency TaskDao. So the problem is with
using Mockito, trying to maintain state through
these one-off stubbing calls can get messy really fast. So we could instead
implement our fake using-- implement our TaskDao using
a fake like we did earlier with the repository. Well, we're going to choose
not to go down that route for some good reasons. Firstly, it doesn't seem
that the TaskDao interface is going to be part of
our modules public API. And so no one else is going to
benefit from reusing that fake. And secondly, right
now I can't think of another part of our code that
would benefit from that fake, too. And this is one of those cases
where it actually makes sense to make use of the real objects
rather than putting in a fake. In this case, room provides
some really useful testing infrastructure for us. We can ask room to build us
an in-memory to-do database. And then we can use that to
obtain the TaskDao backed by that in-memory
database and provide it to our LocalDataStore. Of course, we'll clean
it up after tests. But in all other senses, it's
the same as the production database, but it's faster
as it doesn't write data to files on the file system. And therefore, it also provides
better isolation through tests. So is this still a unit test? Or is it now an
integration test, because we're using real
objects rather than just marks? It's a good question, and one
many people will disagree on. And it's true, the lines
can become blurry at times. But the key takeaway here
is that you shouldn't ever be afraid of using real
dependencies in your tests where it makes sense-- where they're more readable,
more lightweight, and robust. So let's just recap the kinds
of tests that we wrote today. We added an end-to-end test
that covers a critical key user journey through our application. We decomposed a feature to
add an integration test that tests an entire vertical
slides through our application from the UI down
to the data layer. And we also added
an integration test that verifies our
model, which is key because other modules are
going to be depending on it. And finally, we're able
to decompose and write smaller groups of
integration and unit tests, such as the ones for the
UI or the local data store. Marginalization of your
codebase with clearly defined intermodule
contracts allows you to streamline
your project build, create compiled time
dependencies against small API modules, leading to
faster build times on each change, and export
testing infrastructure, such as lightweight
verified fakes that other modules can
swap in and thereby decouple their tests from
your heavyweight production dependencies. So while you can and should have
end-to-end tests to give you confidence in your app,
the vast majority of tests should not be in this category. Marginalizing your
app like this allows you to push down many of
those large end-to-end tests to more focused, smaller
tests at the module level. And each one is
decoupled from the next. Finally, this allows us to build
a really well-balanced pyramid. And through this
thoughtful architecture, there's a number of
obvious cutoff points that have surfaced naturally
within the pyramid. You'll need to identify
the right spots for testing in your own application. What works for one project
might not work for another. So it's really important
that whatever you choose, you document it clearly so that
all collaborators on your team are on the same page. In Android Development,
there's two kinds of tests. Local tests that run
on the VM level, JVM. They can be just
pure JUnit tests, or they can use
Robolectric to provide a simulation of Android. They're much faster. They're highly
scalable, but they don't offer the same confidence
that a real device would. On the other hand, there's
instrumentation tests that run on a real
or virtual device. While slower,
lacking scalability, they are true to the
behavior of real Android. Last year we launched
Jetpack, AndroidX test, which brought together a
unified set of APIs that will work on both kinds of tests. And these APIs allow us to
focus on writing Android tests without thinking about the tools
that we're using underneath or where the test is
going to be executed. And at the heart of what
we're releasing today is increased stability,
improved interoperation with Android Studio,
better off-device support for Espresso, resources,
and the UI thread control. And of course, the
support for the latest Jetpack architecture components. While tests of all sizes can run
on a real or a virtual device, these improvements have
made it possible to run increasingly larger integration
tests faster on the local JVM. All of the integration tests
that we've documented today and in the code lab
will run equally well on both the local JVM or in
a real or a virtual device. Project Nitrogen is our vision
for a unified test execution platform. It brings together all
these many disparate tools and environments. With Nitrogen, any test that's
written with a unified API using AndroidX tests can be
run on any of these execution platforms seamlessly
from Android Studio or your
continuous build system. You've got the option
to run any Android test on a variety
of these platforms, such as virtual devices, cloud
farms, simulator devices. And while the team
is still working hard to bring this vision to
reality, in the meantime, we'll share a little
trick with you. Normally, local tests would be
placed in the test source root. Instrumentation tests go in
the Android test source route. But to show you what's
possible with a unified API, in this code lab, we're
using a little trick to create a shared test
source root folder. And here we can
place tests that are written with the unified API run
on both device and off-device. And how and where
you decide to run them really depends on
your project's philosophies or needs. But here you can start
to see the possibilities. Today we're also launching
an early access program for Nitrogen for
tools integrators. So if you're a
developer that maintains monitoring profiling
performance tools, you provide continuous
integration platforms, you build real or device
services for developers, you make IDEs or build farms,
we're looking to hear from you and get your feedback
on our early access. So please go ahead, check
out the code in the code lab. You can see the great
examples for project structure and blueprints. Examples of the
kinds of tests you should be writing at different
level using the unified APIs. And see just what kinds of
tests are possible to run on- and off-device, which leads
the way to Project Nitrogen. This is all available online
now and it's available right here in the code lab
section for you to check out. [MUSIC PLAYING]