Build testable apps for Android (Google I/O'19)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] JONATHAN GERRISH: Good morning and welcome. My name's Jonathan, and I'm an engineer at Google. And welcome to today's session. We're going to talk about how to build testable apps for Android. So through the history of time, architectures continually evolved in style and ambition to fit our expanding visions. In order to build these increasingly complex architectures, we've needed to also innovate new tools and patterns and methods to meet the demands of building them. And just like in the real world, in software, too, we've all had our own evolutionary path. We've seen this in Android Development, too. Who remembers building smaller applications in Eclipse? And as the complexity of our applications grew, so did the need for better tools and testing tools. And now today, we're building a quite complex applications within Android Studio. And last year, it was Jetpack, which included a lot of great libraries to abstract some of the details, allowing those developers to focus on writing great features. But sometimes the evolution hasn't always been easy. And choices developers have had to make haven't always been so clear. So how do you answer questions such as, how should I architect my application? Or, how do I organize my code base? What library should I choose? And what tools should I pick? The choices that you make early on in development have a long-lasting impact on the testability of your app, and thereby, your development velocity and your ability to add new features sustainably. Well, today we're going to show you with some real-world examples how to make the pragmatic decisions in how you build your applications in order to build a long-term testing strategy. In testing, the key attribute to consider is scope. Now, scope means how much of my applications in my test covering and test could run in on just a single method, or they could span multiple features, multiple screens. And scope directly impacts two other attributes of testing. Speed. How fast does your test run? And some can take from the order of milliseconds, and others all the way up to minutes, or maybe more. And fidelity. How close to real-world scenarios does your test case simulate? An increasing scope typically increases the fidelity of your tests. But it does so at the expense of speed, and vise versa. And you can't actually have a single test that gives you the best of everything. The question is when is good enough? When do you really need perfect? And how do you achieve the right balance? The testing pyramid is used as a guide to help you create that balance. As you go up the pyramid, you improve on fidelity by increasing the scope. But remember, this comes at the price of speed, focus, and scalability. Unit tests-- they've got to be fast, lightweight, highly focused, in order that they can achieve high scalability. They're really easy to define, because most of the time we're just testing a single method in a single class within our application. And this means that they're going to give you really high degree of focus to the origin of a failure. Integration tests are the next category of test. And here we're trying to bring together several units within our application. And we're interested in verifying their collaboration, making sure that when we bring them together, that they all behave expected as a whole. And end-to-end tests, they step through key paths in our application, after uncovering multiple screens and features. And these are also real easy to define, because at this stage, we know we're testing our whole application. Today, we're launching a new to-do application. Well, it's not really an official Google product. But it is a real application. It's part of the refreshed Android testing code lab that we're launching today. So you can go ahead, check it out, build, test, work through all the examples that you'll see right here in this session today. Now we're going to work through building this application together. And in doing so, along the way, we'll discuss some of the challenges and the choices that will be faced. Building an application usually starts by defining some key critical user journeys. And a critical user journey is a step-by-step path that the user takes through an application. And the idea is in order to meet a predefined end goal, the journey may span multiple screens and decision points to get to that end goal. And they're often sketched out by a series of mockups. And let's take a look at some that our UX designers just sent us. Our first user journey is that of creating a new task. Users arrive on the home screen, which has a list of tasks. The first time they get there, it's going to be empty. There's a floating action button that they can click. It takes them to the next screen, where they can enter details for their tasks. They can click Save, and they return back to the home screen. And their new task should show up. Our second user journey is about checking our progress. So users can select an existing task. They can mark it as completed. And then they can go and view their progress on a statistics screen that shows them just how productive they are. Now, every project starts off small. But if careful attention isn't paid to design, architecture, organization, during the growth of that code base, development can quickly spiral out of control as your application grows uncontrollably. Without any thought, your code base can quickly turn to a huge monolith, a spaghetti-like ball of incoherent dependencies that are not only hard to reason about but they're difficult to test as well. If individual units don't follow key principles such as that of high cohesion and low coupling, they become really difficult to test alone in isolation. And furthermore, with a monolithic code base like this, anytime you make a single change to your application, you have to rebuild everything. And these factors force the majority of tests to end up being large end-to-end tests. How does this affect our pyramid? Well, with such resulting chaos, you can see that our pyramid is now completely disproportional to how we'd like it to look. If we do try to think about organization from when we start out, our first thought might be following a layered architecture. At this stage in development, it's the only dimension that is visible to us. And there are also Android concepts that map neatly to each layer. So maybe this makes sense. And by structuring our code this way, we can slash dependencies, follow those principles of high cohesion, low coupling, maybe introduce dependency injection, and now we can see that unit tests are possible. But as our application grows in complexity, we start to notice that it grows by the dimension of features rather than architectural layers. So even if we did modulize our code this way, a small change high up in the tech stack is only going to save a couple of layers of modules of rebuild, whereas something lower down still causes a complete rebuild of the old application. Furthermore, the layers themselves become monolithic. And so we still end up writing so many large end-to-end tests. Now, while the ability to start writing unit tests is really good, projects are still left with a pyramid that doesn't quite look right. And the problem with this setup is that in order to compensate for our fidelity gap and unit tests, we're overcompensating dramatically in end-to-end end tests, which are slow and heavyweight. There's nothing here that's guiding us so far in order to make a balanced pyramid. So poorly organized and architected code base can quickly lead to severe bottlenecks in your development workflow. By overrelying on these large end-to-end tests, were faced with test suites that take far too long to run. And the lack of focus in them mean the bugs are really hard to track down. Without effective marginalization, every change we make to the app causes large swaths of it to be rebuilt, and all the tests must be rerun. These key points can cripple your team's velocity. But organizing your code correctly has a big impact on testability and development velocity. So we want to get it right from the get-go. We want to create a way that's going to be scalable as we move forward and our application grows over time. So let's think about how we may decompose our project. At the top we've got our application. And one of the key areas of functionalities is managing tasks. We also have a progress module that has a dependency on tasks. And as we dive in, we notice that task is a really big feature. We can further decompose it with add, edit, list, view. And organizing our code this way allows our development to scale as our application grows and new features are added. And we're also able to scale in depth of complexity, too. As features become more complex, we can continue breaking them down, adding more modules. And this approach to organization makes sense, since two components in the same domain are much more related in function than two components that might just happen to be an activity. We can implement this kind of organization both through language features such as packaging but also through our build system, like Gradle modules or [? basl ?] libraries. We can add domain-orientated modules now to the application and define clear API boundaries to contractualize their interactions. So now we have a way to shard our application, which makes it possible to isolate the components for more focused testing. Finally, we can see blueprints for integration tests. And of course, all these modules are going to be decomposed and be unit testable. And we can still write our large end-to-end test. Furthermore, this organization allows us to scale as we add new features that test scale along with them. You can use this guide as a starting point. And of course, you can decompose further or in different ways that make sense for your application. The key thing here, though, is to remember to provide natural guides and templates for different categories of tests for your application. To build our to-do application, we're going to be using some of the architecture component libraries from Jetpack such as data binding, view model, live data, navigation, and room. We're going to follow the model view model pattern and MVVM to architecture application. This provides a really clear separation of concerns. And Jetpack's architecture component libraries really fit in well neatly with this. I'm going to start with a single activity that uses the navigation component to map the user's flows through a series of fragments. Each one managing its own screen. Each fragment has its own XML layout that's mapped directly to its own view model, using data binding architecture component. It will also use live data to reflect changes back up into the view. And our model layer is going to be abstracted under a repository that contains both a remote data source and a local data source that's backed by SQLite and using room architecture component. On Android, the user interface is updated on the UI thread. And so long as the events that we post there are nice and short tasks, our UI stays snappy and responsive. In our application, however, everything is not going to fit that criteria. We use both a local database and we make crests to a remote REST API for dealing with task data. Operations to both of these components take a long time. And if we were to run these on the UI thread, we'd quickly see that our application becomes slow or even unresponsive. So of course, we need to make sure that these long-running operations occur asynchronously in the background somehow so that we're not blocking our UI thread from responding while we're waiting for these tasks to complete. In our application, we're going to make use of Kotlin's coroutines for asynchronous operations. You can think of coroutines as lightweight threads. And although they've been stable for only a relatively short amount of time, the community has adopted them very quickly. And they've become a clear trend in Android development. A coroutine scope keeps track of all the coroutines it creates. And if you cancel a scope, it thereby cancels all of the coroutines that were created in that scope. In our application, coroutines are launched from the view model objects, using a special view model scope. This is particularly useful when our view model gets destroyed, because it automatically cancels all of those existing child coroutines. It's going to save resources and avoids potential memory leaks along the way. And from within a scope, we can call down to our tasks repository. The coroutine scope created in our task repository is concerned for parallel decomposition of work. When any child coroutine in this scope fails, the entire scope fails and all of the remaining coroutines are canceled. This function returns as soon as its given block and all of the child coroutines are both complete. Coroutines can certainly make developing asynchronous code a lot simpler. So let's start by implementing our first critical user journey. Just to recap, we start on the home screen. We click a floating action button, which takes us to the detail screen. Here we enter text for our new task, save, and we're back to the home screen, where we can see our newly created note. We're going to develop our application using test-driven development. And this is a school of thought where we first codify the specification of our application in tests, first of all, and only then do we write the production code in order to satisfy that specification. We're also going to do all this top down, starting from the end-to-end test, and then breaking this down, and decomposing further and further, until we finally reach the individual units that are required to satisfy the feature we're building. So let's start by writing an end-to-end test. It's going to be failing first, but we know that by the time we make it pass, our feature's complete. It's a good signal for the end state. Let's review some key qualities of end-to-end tests. The main thing we're looking for here is that we've got confidence in the final application when it's finished. Therefore, these kinds of tests should run on a real or a virtual device and make sure that our code interacts with the Android environment as expected. Our application should also look as close to the final application as possible that will go into ship. And we should test it in the very same way that our users are going to interact with it. This means we're doing blackbox testing. And here, we don't need to be exhaustive with all the tests. That's the job of testing other layers. Now, let's examine the scope of the code and see what we're going to exercise in our test. It looks like for our first test case, the AddEditTasksFragment screen and the TasksFragment screen are what's important. So for this particular end-to-end test, we're just going to discard and ignore task details for the moment. We can start on the home screen by using activity scenario to launch the task activity class. Then we can click on the floating action button, which should take us to the next screen. And here we can use Espresso to enter text into the detail screen. And one more time with Espresso to click the button, which would send us back to the first screen. And here, make a simple assertion to check that the newly added task appears on the home screen. Now we're not using any special APIs or any hooks or back doors. This is known as black box testing. And interacting with the application this way gives us the confidence that it'll still work if a real user were to step through the flow in exactly the same way. So now we need to add some integration tests in order to bridge the gap between those large end-to-end tests that we've just written and the smaller, faster, exhaustive unit tests that we'll be adding later. Here we're looking for something that gives us a good clue that all of the individual units that we're bringing together collaborate as planned. That's where the focus should be here. These tests will be relatively scalable. And providing enough coverage at this level means we need to lean less and less on those large, heavyweight end-to-end tests. Here it's kind of less important that we're using all real components. And it's OK to make judicious use of testing APIs. But what exactly kind of tests should we be writing at this level? When we introduced scope earlier in the session, we defined it as the amount of real code that's exercised by the test. And in the end-to-end test we've already seen, that scope's pretty large. With integration tests, it's a little more nuanced. Luckily, our architecture and code organization leads us straight to some good candidates. Let's approach this by decomposition. If the previous end-to-end tests just focused on the AddEditTasksFragment screen and the TasksFragment screen, we already know that this next integration test has got to be a smaller scope than that. And looking at our architecture diagram, I can already see the first candidate. Let's start by writing an integration test for the entire tech stack that supports the AddNewTest screen. So we remove the TaskList screen from the equation. Do you see any other candidates here where we might want to limit their scope? Some of the objects in the scope of your test might have some undesirable characteristics. Perhaps one of them is too slow. Maybe it reads a large file at startup. Perhaps another is a really heavyweight dependency that takes a long time to build. Perhaps it makes an arbitrary network connections, causing a test to be flaky. And some dependencies, they just can't be controlled in the way that we need to simulate within our tests. In such cases, you may want to consider replacing that original dependency with a test double. Test doubles are stand-ins for the real object. There are several categories of test doubles. Each of them range in fidelity. Dummies. These are just intended for stand-in for the real behavior just to satisfy dependencies. Then stubs, which aim to offer one-off specific behavior. It'll allow you to configure it for the needs of your test. Either of these could be hand-rolled or they could be provided by your mocking library, such as Mockito. Or consider fakes, which aim to be a more accurate, yet lightweight substitute for the real thing. And you may be surprised to still see real objects up here. Sometimes, though, it makes sense to use real objects in your tests if it avoids any of those criteria that we considered before, and where it makes the test more readable and robust over the alternative. Value objects are just one example of why you should always prefer using a real object. Taking a closer look, there are now some candidates where we might want to start increasing removing the scope. We could drive our test through TasksActivity. But this is concerned with the navigation between screens. And we don't need to test this at this level. That's more of an end-to-end test. So instead, we're going to reach for FragmentScenario and use Espresso to test the UI directly. We're going to need switch in a test double for our navigation controller, however. And we can use this to verify that our navigation is working as expected. TaskRepository, it presents a clear and well-defined API to all the layers above. So it's good practice to make use of this API from tests and to use that to check to see if our test had saved the task correctly. But look, including a remote data source, which connects to an external server, that's going to make our test slow and flaky. So let's switch that out also for a test double. So first, we're using FragmentScenario to launch our fragment. And we need to verify that our floating action button sends us to the right screen. And the navigation controller handles this kind of thing. We don't actually need to go to that new screen for this kind of test. We just need to record that we went there. So we can swap out the navigation controller for a test double. There isn't actually a fake version provided. So in this case, I think it's perfectly acceptable just to shim in a mock like this. And now we can use Espresso APIs to enter some text in the fields as we did before, clicking the floating action button to save the task. And for the final part of the test, we need to check two things. First, was the task saved correctly? So we can do this by obtaining the task service or the task repository from the service locator. And we can use its APIs to get a list of the tasks that were saved. And then we can make sure it contains one that was saved that matches the one we tried to save through the UI. The next assertion is did we get back to the right screen, OK? We can check with our mock navigation controller to make sure that the right navigation event was sent that would have directed us to the right screen. And we can decompose further and look for other ways that we might want to limit scope in order to create smaller and smaller integration tests. Let's take TaskRepository, for example. It represents our model. It's got a well-defined API that supports all the task UI features, as well as features in other modules like the progress module. And it's also likely to contain large amounts of complexity and business value. And it includes a good deal of collaborators. And this makes it a great candidate for covering with an integration test. So let's remove all of the UI from the scope of this test. Now we can proceed to directly test this well-defined API of our test repository. And here we'll make similar choices when it comes to fidelity to our metric principles and speed trade-offs, just like we did in the last test. We'll keep using a fake to stand in for the real data source, as well as providing us with repeatable tests. A fake here allows us to configure all kinds of test data sets that we might want to wire up for certain conditions, testing in different ways. Having a well-defined API at the model layer also allows us to do something else that's really cool. What if we take our TaskRepository and extract away an interface? Now we can create a fake version. And by running the same test against the fake that we run against our production repository, our fake becomes a verified fake. And what we're doing is guaranteeing its behavior meets the same specification as our real production code. And if we create separate modules for both our APIs and our fakes, other modules that we interact with will see faster build times and more lightweight tests. So here we have a fake for our model layer that we're confident in, and we can start to use it in other tests. Coming back to the first integration test we wrote for the AddEditTask screen, we could have equally written this integration test with a fake task repository. We trust our fake because it's a verified fake. And it's really fast, too. It probably stores its data in an in-memory hash map. We can apply that same testing blueprint across all of the other modules in the tasks UI. These UI modules are another group of components whose integration we're really concerned with. We want to be sure that view models collaborate correctly with our fragments, is our data binding wired up, are all the possible input validation cases handled correctly? And unit tests, these verify the operations are very small units of code. The scope of these kinds of tests is as small as possible. So the code can be tested exhaustively and give very fast and very specific feedback on failures. Our large projects are going to have thousands of these, so they should run in milliseconds. It's totally OK to swap out production dependencies. But they should still be black box in nature. We want to be testing behavior, not implementation. And the line between the categories of tests here can get a little blurry. Let's consider writing a task for our tasks local data store. TaskLocalDataStore takes a TaskDao as a dependency. And in a real system, this is provided by the to-do database-- a class generated by room, which is backed by Android SQLite. And if we follow the classic principles of unit testing, we can ask Mockito to provide us a mock for our TaskDao instead. Here in our test, we can create that mock and then pass it in as a dependency to our TaskLocalDataStore. We can create a new task, and then save it in the repository. And then finally, we can validate the insert task call was invoked on our TaskDao. But wait-- this test here already knows too much about the implementation details of save task, how it's implemented. If we were ever going to change that implementation, then the test is going to need updating as well, even if the behavior was supposed to remain the same. This is what is known as a change detector test. And its burdensome maintenance can start to quickly outgrow its usefulness. Effective unit tests should really focus on testing behavior instead. But how should we do that? We can do that by ignoring the internal implementation and focusing on the API contracts instead. Take TaskDataSource. The contract states that when I save a task through the save task method, I should still then be able to retrieve that same task by looking it up by ID. So our test should exercise that contract rather than concerning itself with implementation details. So we'll exercise the save task method on our LocalTaskDataStore. But we won't be concerned with the fact that it calls insert task on the Dao. Then, we'll call get task on the data store again, again forgetting about the implementation. And one thing to bear in mind when writing tasks like this, where the code on the task makes use of coroutines, is that we need to make these asynchronous operations appear synchronous so that our tasks are going to remain deterministic. If we were to get-- if we were to call a get task function and execute it, and sometimes the save task function hadn't completed in time, we'd end up with a flaky test. Luckily, doing so is rather straightforward by asking our test to run blocking. One of the first tools you'll learn to write tests that uses coroutines is the run blocking construct. In the context of run blocking, the given suspend function and all of the calls, children in the call hierarchy, are effectively going to block the main thread until it finishes executing. And you're going to find this a really useful tool when exercising code whose behavior relies on coroutines and needs to be highly deterministic. So the test we actually want to look at is going to look something like this. We create a task, save it to the data source, then we ask the local data source to retrieve that task back for us. And finally, we can make an assertion that we got what we expected. In fact, Google and JetBrains have just recently collaborated to just launch the run blocking test coroutine builder. And this makes testing coroutines even easier. It's currently mocked as an experimental coroutines API. So please, go and check it out and give us some feedback on any bugs that you might find. So in order to write this test, it's important that our data store maintain state. And it does this through its dependency TaskDao. So the problem is with using Mockito, trying to maintain state through these one-off stubbing calls can get messy really fast. So we could instead implement our fake using-- implement our TaskDao using a fake like we did earlier with the repository. Well, we're going to choose not to go down that route for some good reasons. Firstly, it doesn't seem that the TaskDao interface is going to be part of our modules public API. And so no one else is going to benefit from reusing that fake. And secondly, right now I can't think of another part of our code that would benefit from that fake, too. And this is one of those cases where it actually makes sense to make use of the real objects rather than putting in a fake. In this case, room provides some really useful testing infrastructure for us. We can ask room to build us an in-memory to-do database. And then we can use that to obtain the TaskDao backed by that in-memory database and provide it to our LocalDataStore. Of course, we'll clean it up after tests. But in all other senses, it's the same as the production database, but it's faster as it doesn't write data to files on the file system. And therefore, it also provides better isolation through tests. So is this still a unit test? Or is it now an integration test, because we're using real objects rather than just marks? It's a good question, and one many people will disagree on. And it's true, the lines can become blurry at times. But the key takeaway here is that you shouldn't ever be afraid of using real dependencies in your tests where it makes sense-- where they're more readable, more lightweight, and robust. So let's just recap the kinds of tests that we wrote today. We added an end-to-end test that covers a critical key user journey through our application. We decomposed a feature to add an integration test that tests an entire vertical slides through our application from the UI down to the data layer. And we also added an integration test that verifies our model, which is key because other modules are going to be depending on it. And finally, we're able to decompose and write smaller groups of integration and unit tests, such as the ones for the UI or the local data store. Marginalization of your codebase with clearly defined intermodule contracts allows you to streamline your project build, create compiled time dependencies against small API modules, leading to faster build times on each change, and export testing infrastructure, such as lightweight verified fakes that other modules can swap in and thereby decouple their tests from your heavyweight production dependencies. So while you can and should have end-to-end tests to give you confidence in your app, the vast majority of tests should not be in this category. Marginalizing your app like this allows you to push down many of those large end-to-end tests to more focused, smaller tests at the module level. And each one is decoupled from the next. Finally, this allows us to build a really well-balanced pyramid. And through this thoughtful architecture, there's a number of obvious cutoff points that have surfaced naturally within the pyramid. You'll need to identify the right spots for testing in your own application. What works for one project might not work for another. So it's really important that whatever you choose, you document it clearly so that all collaborators on your team are on the same page. In Android Development, there's two kinds of tests. Local tests that run on the VM level, JVM. They can be just pure JUnit tests, or they can use Robolectric to provide a simulation of Android. They're much faster. They're highly scalable, but they don't offer the same confidence that a real device would. On the other hand, there's instrumentation tests that run on a real or virtual device. While slower, lacking scalability, they are true to the behavior of real Android. Last year we launched Jetpack, AndroidX test, which brought together a unified set of APIs that will work on both kinds of tests. And these APIs allow us to focus on writing Android tests without thinking about the tools that we're using underneath or where the test is going to be executed. And at the heart of what we're releasing today is increased stability, improved interoperation with Android Studio, better off-device support for Espresso, resources, and the UI thread control. And of course, the support for the latest Jetpack architecture components. While tests of all sizes can run on a real or a virtual device, these improvements have made it possible to run increasingly larger integration tests faster on the local JVM. All of the integration tests that we've documented today and in the code lab will run equally well on both the local JVM or in a real or a virtual device. Project Nitrogen is our vision for a unified test execution platform. It brings together all these many disparate tools and environments. With Nitrogen, any test that's written with a unified API using AndroidX tests can be run on any of these execution platforms seamlessly from Android Studio or your continuous build system. You've got the option to run any Android test on a variety of these platforms, such as virtual devices, cloud farms, simulator devices. And while the team is still working hard to bring this vision to reality, in the meantime, we'll share a little trick with you. Normally, local tests would be placed in the test source root. Instrumentation tests go in the Android test source route. But to show you what's possible with a unified API, in this code lab, we're using a little trick to create a shared test source root folder. And here we can place tests that are written with the unified API run on both device and off-device. And how and where you decide to run them really depends on your project's philosophies or needs. But here you can start to see the possibilities. Today we're also launching an early access program for Nitrogen for tools integrators. So if you're a developer that maintains monitoring profiling performance tools, you provide continuous integration platforms, you build real or device services for developers, you make IDEs or build farms, we're looking to hear from you and get your feedback on our early access. So please go ahead, check out the code in the code lab. You can see the great examples for project structure and blueprints. Examples of the kinds of tests you should be writing at different level using the unified APIs. And see just what kinds of tests are possible to run on- and off-device, which leads the way to Project Nitrogen. This is all available online now and it's available right here in the code lab section for you to check out. [MUSIC PLAYING]

Info

Channel: Android Developers

Views: 44,209

Rating: 4.8876033 out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate

Id: VJi2vmaQe6w

Channel Id: undefined

Length: 41min 41sec (2501 seconds)

Published: Thu May 09 2019