>> Hello. I am Jessica Baker. I am a Software Engineer
at Rare, and I am here
to talk to you today about Automated Testing at Scale
in Sea of Thieves. First, a little bit about me. I am Software Engineer
on the Gameplay Team at Rare. Software Engineer if I am trying
to apply for a mortgage; Gameplay Programmer if I am
trying to sound cool. I have been there
for two years. During that time, I have worked
on all kinds of different stuff ranging from AI to a little bit
of backend services. Before that,
I actually was doing a mechanical engineering degree,
so obviously, I am interested in physics
and maths and simulations. But I am also really interested
in engineering processes and how we can do those
across different engineering disciplines. That is part of the reason
why I am so interested in automated testing. Now, if you have not heard
of Sea of Thieves, it is an online multiplayer
pirate adventure game which we shipped last year
on Xbox One and PC. It is a free-form,
socially-focused game where you are sailing around
with your friends in a crew on your pirate ship
doing piratey things. Looking for treasure, you are
fighting enemy skeletons, getting in ship battles.
We have been releasing it as a game under the games
as a service structure. We have been releasing
these regular updates culminating this year -
not culminating, but the latest one is going
to be our anniversary edition launch at the end of April,
I believe. You would have seen
the trailer for that if you went to my colleague
John’s talk earlier. In this presentation, I am going
to talk a little bit about the journey
we went through in developing Sea of Thieves and why we found
that automated testing was a really good way
of facilitating this games as a service model. I am going to give you
a little bit of an overview of Unreal Engine’s automation
system, what it gives you straight
out of the box. Although, I will not be
covering everything you could possibly use.
Mostly what we have been using, which is actually
we are developing with 4.10 rather than the most
up-to-date version, so there may be more that
I won’t be able to cover here. I am going to talk a bit
about how we extended Unreal Engine 4 for our needs,
about how we use our tests, and how we get a change
from a developer all the way out to general
release as bug-free as possible. Lastly, a little bit of the
pointy end of automated testing, actually writing effective,
automated tests that do what
you need them to do. Why use automated testing?
Why for Sea of Thieves? Mostly, again, to do with
the games as a service model. We need regular content updates
to keep players coming back
for lots of new content, lots of new interactions
regularly. We need to have
a really quick response to player feedback
to keep our community happy, and that means
flexible releases. We actually have the capability to ship multiple times
in a week. Lastly is to do with how
the design of the game works from
minute-to-minute gameplay. It is designed under
the tenant of tools not rules. Instead of giving
the player rules, things like you
can bale out your ship if it is filling up with water. Instead, we just give
you a bucket and then we are like,
do what you want. That might mean
scooping up water, it might mean scooping up vomit
and chucking it at your mates. That means each new tool
we add introduces a whole new host
of interactions to test. We are constantly
adding interactions that need to be tested. Just to take it back
to the very basics, I like to explain things
from first principles because we all have gaps
in our knowledge at any level of complexity. An automated test is
a program or a script that can execute a root
through your software and check
that it behaves as expected. We might want to break
that down into three parts: setup, input, and output. Setup is where we set up
the environment, the behavior where observing
is going to happen. Input is actually triggering
that behavior, and output is seeing
what the outcome is and testing that that’s
what we want it to be. As a really simple example,
say you’ve written a Function that just adds two integers
together, returns the result. If you were testing that, you might -
just for the sake of argument, so we have something to set up,
let us say it takes those integers
by reference for some reason. We are assigning those integers
to local variables. Then we can pass them
into our adding Function. That’s our input,
the trigger for the behavior. Our output is the actual result
of that, which we expect to be 11. We name our tests
according to this structure. We call them Given,
When, Then. If we were trying to name
this test, we might call it, given two integers, when passing
to the adding Function, returns the sum
of the two integers. Let us talk about what Unreal
gives you right out of the box. Unreal provides
the Automation Framework which is able to execute
automated processes on any kind of Unreal build. For testing, it provides
this FAutomationTestBase Class, which has a Function
called RunTest, which I'm sure I do not have
to explain what that does. When it is instantiated,
the FAutomationTestBase will register itself
with the Automation Framework, which can then be
triggered to run it. If you want to write
your own test, you can override this
FAutomationTestBase Class, you can inherit from it, and override
the RunTest Function. If this returns true and we
do not hit any exceptions, error logs during the course
of the test, then the test will have passed. Unreal gives you
a couple of helpers to deal with some of the boilerplate
involved in that. The first one is
IMPLEMENT_SIMPLE_AUTOMATED_TEST. They have given me a laser. Here is the name
of your test Class. You can give it a pretty name
for it to show up in Editor, and the automation test flags
can tell you what kinds of builds
you’re running this against. There, we’ve just overridden
the RunTest Function to test that
1 is still less than 2, because if that’s not true
anymore, something’s horribly wrong. I have noticed
there was an error on this slide when it was too late to fix it. This will not compile,
obviously. It is not returning any value,
so let us just imagine it says ReturnTrue at the end. Now, the Unreal Documentation
recommends a test as an example of how you might want to write
a simple automation test. I will not bother trying to read
the whole thing, but you will notice that
by the Unreal standards, they recommend that you do
one test for a particular Class, and you test everything for
that Class in the single test. That is not our standard, which I will
go into a bit more later. But that is how
they recommend you do it. You will have noticed
that the RunTest Function takes a string parameter. The simple tests
do not actually use this. But it also provides IMPLEMENT_COMPLEX_
AUTOMATION_TEST. In this one, you can override
a GetTests Function as well. You can provide
an Array of strings to run the same test body on. This is another
sort of toy example. I have made an Array of strings which are just the names
of all the days of the week. Our test body is just checking that they all contain
the word “day”. One thing to note is that each
test case will be considered a separate test
in the session frontend. I will explain in a moment.
You are also provided with the means to do
latent automation commands. This is quite similar
to how the test helpers work. You can create commands
which have override and update Function, and this will keep
running every frame. It will run this update
for every frame until it returns true. If you want to use this,
for example, you might have an object that takes a little
while to initialize. You can kick off
the initialization and then run
this automation command which might check every frame,
is this object initialized yet, and when it is,
then it returns true, latent automation command
is finished. For running automation tests,
you have a couple of options. First is the session frontend. I think you can attach this to
various kinds of Unreal build, but it is definitely
available in the Editor. This will list all the tests that are available
in your build. You can filter them, run them,
check through the logs, debug, and you can find them
in the Editor under the Developer Tools
section in the Window menu. You can also run tests
through the command line. We use this to integrate with our continuous integration
software TeamCity. TeamCity is able to execute
various jobs on our build farm
automatically. We use it to build our builds,
we use it to distribute, deploy, but we can also use it to run
test suites on our builds. We do this regularly,
and the most regular one is every 20 minutes
up to overnight. Obviously, the 20-minute one
is a much quicker test suite. We choose the quick test, but also the ones
that are most critical for everyone to keep working.
Things like, check that the game
will boot up and run. The last one is
we have also rolled our own tools unit testing. Our unit test running tool
works very similarly to the session
frontend and filter; run your test,
check the output, debug. But it just reduces the overhead of having to spin up
a whole Editor if we want to use
a session frontend. A better use of this
complex automation test helper than checking
that day spellings are right, is you can use
your GetTest Function to get the Asset
reference strings of all of the maps
within a map test directory. Then you can use your RunTest
Function to load up the map, run it, and wait for some
sort of test success, test failure Event to come
from the level Blueprint. That means that adding
a new gameplay test to work around the fact that
Unreal unit tests are so atomic and do not support things
like Actors - to run a gameplay test,
you just added a new map. Then, you can use
the level Blueprint to actually execute
the gameplay feature and check the output of that.
This screenshot is an example from my colleague Rob Masella’s
GDC talk on this same topic. He went into a bit more detail, but effectively,
this test will force play input so that the player can approach
the wheel, grab it, turn it, and then we can check
that the wheel has turned. That is one of the capabilities
that we have access to, is the ability
to fake player input. We have added our own utilities
for networked gameplay testing. These nodes will pass execution
of the Blueprint between the client
and the server, which is obviously really useful
for checking things that happen
across the network. We might want to set
stuff up on the server and then observe
on the client, or we can keep passing
back and forth. You can very thoroughly check
gameplay scenarios with this. You can do a map test
for every single interaction. But there are some challenges
with them, one being the speed. A Blueprint map test
can take up to 20 seconds, or on average, actually,
20 seconds to run. Whereas a coded test
is typically 0.1 seconds. Secondly, because you are
sort of testing an environment where
everything is real. You are using real gameplay
objects in real time. They can be
a little bit unreliable. For example, the skeletons
firing cannons feature, which does what it says
on the tin, pretty much. But if a player ship goes into
range of a cannon on an island, a skeleton will spawn and start
firing cannonballs at it. My bit of the work was doing
the physics prediction algorithm so that the skeleton knew
how to aim the cannon to hit the ship
in a satisfyingly realistic way. To test this, I set up a
map test and Blueprint with a skeleton, a ship, cannon,
and wait for it to start firing. Because we want to make sure that a cannonball
can land near or on the ship. We can check that that happens
by checking the cannonball every frame and seeing
if any of those checks are within range of the ship. Of course, the problem here
is that there’s no guarantee that our frame rate
will be frequent enough that we are actually
going to check it while it is in that area. Of course, there are other ways
to solve this in this particular example.
But it illustrates how time, latency can be a problem
for testing, or what you might call
one of the Four Horsemen of the Testing Apocalypse. I was a bit tight on time
for this talk, and I thought about removing
this slide, but then I did not want to.
Now you have to deal with it. Our four horsemen are:
latency, randomization, globals, and dependencies. These are elements
that you ideally want to be able to remove or isolate
from your test environment. In this case, latency is the one
that is causing us problems because it is not
necessarily deterministic. But map test is still useful.
System and bootflow test, golden path gameplay
tests or integration tests - these sort of tests kind of -
you want to check that all of your elements
are working together so the fact that map tests pull in all these dependencies
can be really useful. But if you do want to check
every sort of permutation of your gameplay,
what we did is we took it all the way back
to FAutomationTestBase. We added our own helpers that would add intermediate
levels of inheritance between AutomationTestBase
and your test Class, to use as test fixtures.
These will provide utilities that can be applied
to every test so that you don’t have
to repeat yourself. One of these can just be
setting up a map test in code so we can create a utility
that will create a World, get it ticking,
get the right Game Mode on it, and then to create your test,
you can inherit from that. We have macros that will wrap up
all that boilerplate. As well as this,
if we do not need a whole map and we just want
to test individual Actors, we have an FActorTestFixture, and that will create
just an empty World with minimum stuff in it that you can just use
to spawn Actors into. Now, remember I was talking
about the Unreal example of a unit test
being one test per Class and all the checks inside it? We find that this can be a bit
problematic for testing Actors, because you might end up with
persistent State between tests, and also it is good
to just be able to have one test be one scenario. You just have a list of tests
passed, tests failed, and you instantly know
which scenarios work. For this, if we are wanting to
do multiple tests for one Actor, we can add Actor-specific
test fixtures as well, which you will just
handle the utilizes, make sure we are not
repeating ourselves. For example, if we are wanting
to test the spyglass, then we can create a utility
that will spawn up a spyglass and have an Actor wield it or whatever else we need it
to do for it to work properly. We have a few other test types
as well. The Asset audit is very similar
to the map test in that it
loads up Asset references, but it does it for every Asset
in the game or in our build, and then we can set up
an Asset audit test for each Class of test,
for each Class of Asset. Say you add a voyage type Asset
and we want to make sure that if you have minimum amount
of gold from a voyage or maximum amount of the gold
for the Asset, a designer cannot accidentally
put in 400 minimum, 300 maximum. You can put that in as a check. Screenshot comparison,
which the rendering team used. This will automate a scenario,
take a screenshot, and compare that against
the stock’s screenshot of the scenario
working as expected. Finally, performance tests, which we touched on in my
colleague John’s talk earlier where we set up
a nightmare gameplay scenario and put output metrics
so that we can make sure it is going to run smoothly on
all of the platforms we ship to and all the hardware
we support. It breaks down a bit like this
on how many tests we have. As you can see, Actor tests
are by far the most common. We are checking a lot
of gameplay through that. Unit tests do a lot
of similar jobs as the Actor tests,
might handle more engine stuff because it is
basically unit tests, though you cannot
spawn up an Actor. Map tests I think in this data includes both the Blueprint
and code and map test. I actually nicked this
slide from Rob’s talk that I mentioned earlier. He referred to them
as integration tests, but they are the same thing,
if that is confusing at all if you are watching that later. In total, we have over
23,000 tests to run. That is not including the Asset
audit tests on the basis that if you remember
from earlier with my complex test example, it included every test case
as a separate test even though it is
the same test body. The same goes for Assets. We have the same amount
of Asset audit tests, 81,000, as we do Assets. Overall,
that is over 100,000 tests. I am going to go through
how we actually use these tests, all 100,000 of them.
What is important to note here is that due to our need
for flexible, fast releases, we use a continuous
delivery process. This means that in theory,
we can ship at any time. We try and keep our build
constantly bug-free, or as bug-free as we can. That plays out
in a bug count graph that looks like
this yellow line. The gray line is representing
the bug count on Banjo Kazooie: Nuts and Bolts,
one of our previous titles which was under a more
traditional gameplay process where you reach feature complete and then go through
and fix all the bugs. That is a peak
of over 3,000 bugs. Of course, this bug-fixing time
can be very unpredictable in how long it will take, and that means
it is hard to schedule. That is when you are likely
to get crunch. Whereas by keeping
our bug count low, we have managed to
reduce crunch significantly on the Sea of Thieves project. It means that we are in theory
able to ship at any time. To get a change from a developer
to a player is going to go through
several stages with verification in between each one. The stages are the local changes
on a developer’s machine, that gets submitted to source
control once it is verified, we take a preview build daily
for internal testing, a limited release - so this is players in our
insider program who are under NDA, have access
to early builds of the game, and then all the way out
to general release. The last thing we want,
of course, is for a bug to reach here. It is seen by a lot of players,
it is going out on Twitch to hundreds of
thousands of people. We can prevent that by doing
different stages of verification at each stage of delivery. In an ideal world, we would be wanting to get rid
of 100 percent of our bugs before they even get checked
into source control through these
preventative measures. We do not live
in an ideal world, unfortunately, but we can
get rid of a lot of them and maintain this continuous
delivery process. The first one
is the session frontend, which I mentioned earlier. Each developer checks in their
changes with a full set of tests for any new interactions
they have added, and they are expected
to make all the tests pass to the best of their knowledge
before they can check it in. In this case, I might have
changed something on the scale, I can see what is working
and what is not. Secondly,
this is TeamCity again. You can submit your changes
to TeamCity to be consolidated with the very
latest version of the build, and then we can run
a test suite on it. This has to pass 100 percent
of the tests for us to be allowed to check it in. This is one of my failed ones
where I tried to clean up the voyage generator tests
and caused 180 build problems, which I am really glad that I
caught before I checked it in. I think it was something
like a typo and it did not compile,
so all of the tests failed. You have to get a green remote
run before you can check in. Lastly, it is still a good idea to give your change
a quick manual test to catch any test coverage
you have missed or to catch any things
that are not so pragmatic to test with
with automated testing. Visual issues or audio issues,
particularly. We have probably eliminated most
of our common logic errors during this point. But then once you are submitted
to source control, this is where our regular
automated build verification that I mentioned
before comes in. If any one of these critical
jobs fails, light goes red, and nobody can check
in until it is fixed, which means that
we are enforcing our continuous delivery. It also means that as soon
as something is broken, or we are stopping anyone
from checking in anything else, we can usually pinpoint exactly
which change introduced the issue and either back
it out or fix it. We still have manual testers.
We have a lot fewer of them, and the great thing about using
that alongside automatic testing is they are doing a lot less
of the sort of routine manual testing
or having to check things every time there is a
new change to it. Instead, they are doing
what manual testers are best at, finding weird and funky
new ways to break the game. Automated tests are great
for the issues you might be able to predict, and manual testers
are great for testing things that you couldn’t possibly
have predicted, and great for picking up, again,
those audio or visual issues. When it goes out to players
through the limited release, we obviously do not want
any bugs to reach any players at all.
But if they do, then this lets us pick up
some of the lower repro bugs. If it only happens
every 1 in 1,000 times, then we might not pick it up
with our manual testers, but the reporting here
is really handy for picking up things
that are low repro. One of the things that makes
people nervous about this kind of process is this sort of front-loading of
quality into this first section, so doing all these checks before you
are allowed to check things in. They say, doesn’t it take
a really long time to get a feature done?
The thing is, you are saving time later
by checking the quality now. That comes back
in a lot fewer bugs. It is so much easier to prevent
a bug than it is to dig through
your three month old code later which has been changed
six times since and try and figure out
what is going on from that. Another nice thing about the manual testing
is if they discover an issue that could be caught
by an automated test, we can then add
in a regression test to stop it
from being broken again. A nice thing is that even though we are constantly
changing the game, adding new things, generally,
if things break and we fix them, they stay fixed,
which is much more sustainable when we are constantly
adding new things. Lastly, I am going to talk about
some best practices for actually writing
automated tests and how you can make them
effective and descriptive of where the issues
are in your game. That is another hang up
about automated testing. It is more trouble
than it is worth; you have to rip up
your production code to make it testable. I definitely felt
the same way very early in my automated testing career, so much so that I tweeted
this back in 2017. “It is all fun and games until
you have to write the tests”. Now, thanks to some good
practices I have learned, I can write good,
testable code straight up without having to rip it apart, fix it later, figure out
how I am going to test it. With that at the forefront
of my mind in using some of
these techniques I am going to go through, I feel much more positive
about it now. It makes me think
about my use cases, makes me think
about my interface, and because obviously I do it
so well and perfectly every time,
it is a joy to write the tests. If I could edit the tweet, I would probably
make it say that. As an example, I am going to use
our Alliances feature. This is a feature
where in Sea of Thieves you sail around with your crew
of friends on your ship, and the Alliances feature
allows you to form an alliance with another crew. You can do voyages together
and share the rewards. To keep track of all
the alliances that might be on a server,
we use the Alliance Service. In our terminology, a service
is a globally accessible object, and it exists for the whole
lifetime of the server. This is good for storing data
needed by different systems. Here are some example alliances. We have got one between Crew C
and Crew D, and we have got another one
between Crews E, F, and G. Let us add some
public Functions, an interface to this service. We definitely want players
to be able to form alliances. If we call that with Crew
A and Crew B, that is now added
to the storage. If we want to query this data,
as an example, let us say we want to get
the number of alliances. It would be an output of three. When we come to test this,
we might be thinking about our one code path,
one scenario is one test rule. We might interpret that
as meaning we want to check each of these Functions. We are staring
with the Form Alliance. You might want to do a test
where your setup is to instantiate
the Alliance Service, your input, the trigger
for the behavior is calling FormAlliance with Crew
A and Crew B, and then checking
that alliance storage to see if there is now an alliance
between Crew A and Crew B. We have already had a problem. How do we check
this private data to make sure the expected
behavior is happening? We might add a Get Function. We might even be tempted
to make that data public. Instead, we are going to
interpret our one code path rule to mean one public
input output flow. Again, we instantiate Alliance
Service, call FormAlliance Crew A and Crew B, and we check
the GetNumberOfAlliances now returns 1.
Now, this is a method called, test behavior,
not implementation. Instead of testing
that certain triggers create certain internal States,
we can examine actual use case flows to check
that they behave as expected. One of the benefits is that
the internals are not affecting whether the test
passes or fails. It is really helpful
for refactoring where you are changing
all the implementation but you want the behavior
to remain the same. Do your tests still all pass
when you change your code? Let us talk about another one
of our horsemen of the test apocalypse.
Dependencies. If another Class or Function is dependent
on the Alliance Service, we want to avoid
the implementation of the Alliance Service
from affecting their tests. Imagine we are not running tests
for the Alliance Service this time.
Instead, just say for fun, we have a
ServerFriendlinessService. The query is the number
of the alliances on the server to determine
how friendly the server is. An example, if there are more
than two alliances, we are going to label it
a friendly server. Otherwise,
it is a curmudgeonly server. We are going to try
and test this. Note that we are not testing
the Alliance Service. We are writing tests for
the ServerFriendlinessService. We do not care what
the Alliance Server is doing. We just want to make sure that this works on the
ServerFriendlinessService. For that, we are going to have
to make sure there is an Alliance Service
for it to query. If we are very good with testing
behavior, not implantation, we are going to call
FormAlliance three times to make sure there are
three alliances to get. Then we can call
this customer Function on the ServerFriendlinessService
and check that we have got that output, so friendliness
is set to friendly. What we can do instead is add an interface
to the Alliance Service. This does not necessarily
have to be a 1-to-1 interface specifically
for the Alliance Service. It could be, say you have got a container
which is holding items. It might want to refer
to the items via a storable interface. We are adding our public
Functions to this interface and querying the Alliance
Service through that interface. This means that in a test
environment, we can not bother using the real
Alliance Service at all. We can use the mock
Alliance Service. In order to make
this GetNumberOfAlliances call return three, which is the input
we want for the test, we can set just a random number
of alliances to return to three,
and override from the interface the GetNumberOfAlliances
Function to just return that or just
make it say return three if we are only using it
in this test. This means that in our test,
now all we need to do is set the number
of alliances to three. It is a little bit
more boilerplate, but one of the good things
about this good practice, mocking out dependencies,
is that we do not have to know about the internals
of other Classes. We just need to know about
their interface. It means that any changes
or breakages - because before, we were
relying on these FormAlliance, GetNumberOfAlliances Function
to work, for the ServerFriendliness test
to work. We no longer are reliant on that because we are not using
the real Alliance Service. The next point is also about
isolating functionality, but that is doing it through
the design of the interface. Let us forget about the other
Functions we had before. We are writing tests
for the Crew Class, which is representing
a crew in game. Say this has some functionality, which probably already
looks dodgy to you. It wants to check if it should
share rewards with another crew. It wants to do that by querying
the alliance interface. We will say, okay, you want
to know who you are allies with? Here are all the alliances
that we are storing. We are doing it very nicely through
the interface, of course. We are going to cycle
through these alliances, find if there is an alliance
that contains its own CrewID and the other crew’s CrewID
to check that. Then even if we are mocking out
the Alliance Service in a test environment, we have to know all about how
it actually stores alliances. If that changes, we then have
to update all the crew tests, which is a nuisance. Instead, we are going
to care less about what the Alliance Service
actually does, and just ask it
for the information we need. You might start by saying,
actually, I just want to get the crews
that I am allied with. Which, on the crew side, involves much less
alliance-rated gubbins. We move that all over
to the Alliance Service. Code looks
much more concise now. Then our test, all we need
to know about is CrewIDs, which the crew
already knows about. Or we could be even more concise
than that and we could just check, am I allied
with this other crew? What it means
for crews to be allied, the crew does not
care about that. All it needs to know
is that they are. That makes the testing
all that much easier. This principle is orthogonality,
treating each Class as a black box
that only provides the information
that you need to know about it, and keeping that logic internal
as possible. It is generally a good object
-oriented programming practice, and it is really maintainable. If anything
about alliances changes, all you need to change
is alliance-related Classes. Using all these methods
and some more - this is just a primer - it means you get not
only instant feedback on how your code is working,
but it also makes you think about thoughtful
interface design by making you actually use
your interfaces straightaway. In summary of what we have
talked about today, Unreal provides basic automation
testing support out of the box. It is quite straightforward
to extend that, and you can get
really good returns, really thorough
testing out of that. It makes a great companion
for sustainable games as a service
through things like making sure that bugs stay fixed. It enforces good
object-oriented programming practices by making you think
about your interfaces. It takes care of
all of that routine testing so the manual testers
do not have to. Now, of course I am here to tell you
that automated testing is great. I do think that. But I am not going to
make you think that it is going to solve
all your problems. Not a silver bullet. That is very illustratable
through my favorite Sea of Thieves clip of all time, mostly
because of this guy’s reaction. [Laughter] Yeah, mistakes do happen, and we are always
catching new ways to improve our testing processes
to make sure that ships stay where they are
supposed to be on the water. Obligatory hiring slide.
If this all sounds good to you, if this sounds like
a good way to work, then we are hiring
and you can go to that URL and check out what all
our available roles are. Thank you for listening. I have left some resources up
of some of our recent talks. I think Rob’s talk has just
gone up on the GDC Vault. I think that is available now, about eight minutes
before this talk started. There are some
Rare tech blog post. I have got a blog which I forgot
to put up on this slide, but if you go to my Twitter,
it is linked on there. Thank you for listening. [Applause] ♫ Unreal logo music ♫