Acceptance Testing for Continuous Delivery • Dave Farley • GOTO 2016

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

what I plan to talk about today is a little bit of a the kind of technicalities of of an acceptance test-driven approach to software development and the practicalities of building automated high-level functional tests that you can live with and that your system under test can change and not compromise the ability of keeping those tests running and passing that's a tricky thing to get right and so there are a number of different small things but that add together to make these possible and that's really the focus of my talk today so first I should remind you to please vote on the talk and then we'll talk about what it is that we're really aiming to cover here so this is my model of a deployment pipeline so they said this is kind of a schematic that I tend to use to describe what we're talking about and what we're talking about continuous delivery really what we're aiming to do is that we're trying to get a high level of confidence that the changes that we just made are likely to be good we can never prove that the changes are good and so what we work is will you work to falsify our changes with automated tests we want to run lots and lots of automated tests and if one of those tests fails we want to discard the release candidate and move on to to trying to fix the problem that we've introduced so this is that all of these things really are forms of acceptance tests and but really my focus is here are on these acceptance tests and the these are kind of defined as part of the continuous delivery approach as as evaluating the software from the perspective of of an external user of the system so that's a good place to start we're trying to answer the question does this do what the users would like it to do the earlier stage of test the kind of TDD stage the unit testing stage is about asserting whether the software does what the developers think it should be doing and that's really really important really really valuable and gives us fast feedback but beyond that we also want to know that he does what the users wanted today we want to be able to use these things as a kind of automated definition of done we'd like to be able to write one of these things these specifications for the behave the system and then we'd like to be able to work until we fit we finished and when when that test passes we know that we're finished it defines the scope of the development people the piece of work that we're trying to do this is not just about testing this is a tool for design this is a tool for development it aids the development process we want to assert that the the code works in production like environments it's not good enough just to run tests that say this little piece works alone aren't my bet is that if you were silly enough to let me into it loose in your organization I could break your software more quickly by changing its configuration then I could by changing the source code and yet very often we don't bother testing those sorts of changes I want to evaluate those changes I want to deploy the software into production like environments you evaluate that the configuration of the system it also works the deployment of the system also works the dependencies that all of our software relies on are also in place and working we want to test the deployed configuration of the whole system and we want to provide timely feedback if it takes us three years to learn whether the change that we just made is good it's a waste of our time that's that's that's not the valuable feedback the shorter the time between making the change and learning whether the change is good the better then the the more that drives good behaviors in general I advised aiming for feedback under an hour no matter the scale of the technical problem and you can do that with some surprisingly complicated technical problems if you apply the ingenuity and think hard about how to optimize for short feedback cycles the sort of acceptance testing that I'm talking about is probably got a number of different names inked in the context of continuous delivery we talked about it as an acceptance testing but it's also being referred to as acceptance test-driven development it's been often talked about in the context of behavior driven development behavior driven development actually came from somewhere else behavior-driven development was originally an idea that was designed to try and allow us to teach TDD more effectively and get to the high value of TDD sooner but often the BDD is now seen he said that the ideas certainly draw a line very nicely with the ideas of testing development is just the scope of the test that is different really a specification by example actually my favorite description of what we're trying to do is this last one executable specifications these are not really tests we're trying to define executable specifications for the behavior of our system I have a friend who has a client that's an airline and they've been using this approach and the when when a member of the public phones up the support organization at this airline and talk to the support personnel the the information that the support people are using on screen are the executable specifications for the behavior of the system they describe the behavior of the system this is a very very strong assertion that the version of system of the system that is in production passed this test it fulfilled this specification the person on the phone knows that that's how the system works because the specification says that's how the system works that's what I'm talking about so I like to think a good acceptance test is an executable specification for the behavior of the system and that's a good mental model for going in and thinking about these things another mental model I think of fondly is this one and this is the idea of software development as a series of feedback loops at the outside we have the crucial feedback loop that we try to optimize in continuous delivery have an idea get that idea into the hands of our users and figure out what I users make of the idea and we work to make that the feedback loop from having an idea and getting that idea to the hands of our users as short as an efficient as possible at the inside is that the TDD feedback loop the test driven development feedback loop we're going to write a test we're going to see it fail we're going to write some code to make it pass we're going to refat so we're going to commit we're going to move on and that's happening in a few seconds or a few minutes usually and then in between is this feedback loop and this is the one that i'm referring to talking about this is about the exit these executable specifications with the behavior system we're going to examine the behavior of our software from the perspective of an external user of our system in production like environments and evaluate it and understand the impact of our changes so what so what's the problem why are you all here and why am I talking to you about this this seems like a good idea right why don't we just do this why is it so hard well the problem is is that what tends to happen people have been trying to do this kind of thing for a long time and what tends to happen is that when the system under test changes it breaks the tests that's a hard problem to solve and what we're really talking about from from a computer science point of view is a problem of coupling the tests in this in this example are too tightly coupled to the system under test therefore one of our strategies is to try and reduce that coupling to work in a ways where we test cases are loosely coupled with respect to the system under test test can be complex the developer we want to be able to we are ideally if we're going to use these things as a fundamental part of our development process we want to be able to have tens of thousands of these test cases be able to develop them very quickly and run them very quickly to assert the behavior of our systems and so we want to be able to create these things quickly and efficiently we don't have to spend hours and hours and hours and days trying to come up with each individual test case so a lot of this is about is about a problem of design and we're going to talk about that in some detail through the rest of the talk meanwhile I think the history of this kind of fuckshit functional testing is littered with bad examples I am I particularly love UI record and playback systems I think that they just so fragile so difficult to maintain and I think these are anti patterns I think using production data in our systems is an anti-pattern I want to be able to test I want to evaluate the behavior of my system precisely I don't want to just randomly throw data at it and see what happens and so I think there are a lot of bad patterns that are and commonly practiced and we gonna talk some more about those things to a fundamental part of this remit from a process point of view is a matter of who owns the test I think it's it's it's important to recognize and and and in a vital part of establishing these effective feedback loops that the developers are in the loop the developers are the people that will make change changes that will break tests therefore they are the people that need to be responsible for making the test pass when that happens we need to close the feedback loop we need to actually slow the developers down if they're making change that are breaking stuff we've got to slow them down to get the to fix and we've got to keep the software working if these things genuine there are executable specifications of the system executable specifications for the behavior of the system then when a developer introduces a change that breaks a test what that means is that the system no longer fulfills its requirements it's no longer fulfilling its behavior or crime contract and that's where we'd like to get to that's what that's the way in which we'd like to work so this last one I've been working in software development as a developer and a technician for for over 35 years and this last one I have never ever seen this work I think this is one of the most toxic ideas in our industry the idea of having a separate QA team writing automated tests divorced from the development team what happens is that the QA team write a few tests and then they get them working and then the development team move on they break that they make change that break the tests and then for the rest of their existence the QA team spends their effort trying to run behind and trying to catch up and trying to patch things together to make the test pass and they almost never do that doesn't give you good feedback we need to close the loop we need to make developers responsible for the tests anybody can write the test these are these are specifications whoever has the clearest idea of what the requirements is from the perspective of an external user of the system they can write the test but as soon as that test begins executing developers are responsible for it and they are the ones that will make changes to make it pass if they introduce a change that makes it fail one of the definitions of continuous delivery is working in a way that has software is always in a releasable state this is a key part of that the rest of my talk really focuses around this and this is a left list of properties that I think of as properties of good acceptance tests a lot of the stuff I'm gonna put before you is based on an experience that I had working in a reasonably complicated complex environment building one of the highest performance exchanges in the world we built the entire enterprise system and use these this kind of strategy so I want you to just keep that in mind we're not talking about simple trivial systems here we're talking about big complicated real world enterprise class systems with performance characteristics that would scare you they scared me when we were working on it so I want to talk to you know I want to talk a little bit in that context and this is kind of informed by that learning I think it's fair to say that that organization which was called owl max a problem with probably world class its automated testing we had tens of thousands of these things running and we'll get into that more later so here are the properties so I think we should be looking focusing our testing on what we want to assert not how is that the system on the tests achieves that we want to be able to we want that test to be isolated from one another we'd like to be able to run lots of these and if we want to get fast feedback we'd probably want to run them in parallel and therefore they cut we can't afford them to bump into one another we want them to be repeatable we'd like to be able to run the same test over and over again and get reliable consistent results in order to do this one of the things that really helps and to help us anybody write the test and to help us understand what the tests mean in the context of the problem right is we use the language of the problem domain I'm gonna introduce the idea of using domain-specific languages to express our needs in this kind of automated testing and we want to be able to test any change we want to be able to evaluate our software in almost any context that we can think of and in that and understand the impact of that when we talk about some cases around that the more than recover but some we'd also like to be these things to be efficient we can't afford to spend days or weeks or even hours waiting for a result we want the fastest feedback that we can get if we can get feedback in under an hour that's a different that's a game-changing level of feedback if after an hour we can be in a position that there's no more work for us to do before we push a change out into production however complex a system that's a game-changing level of feedback so let's start and let's go through the list so we're going to focus on what not how here's a he's a schematic of the system that I was talking about the details of this don't very much don't matter very much I'm going to be talking about this bit the fix API for those of you that are not from the finance industry and don't know what fixes doesn't matter the mental model for this is imagine that when I'm talking about a REST API it's not a REST API it's different but semantically it might as well be in the context of this talk okay so let's imagine we got a system like this we've got a number of different communities of users a number of channels of interaction through the system and we'd like to be able to evaluate the software through all of those channels so typically if we're going to write tests against a system like that this is what we do we'd introduce a whole bunch of test cases that that that describe how you know what it is that we want to test and how it is that we're going to interact with the system under test now the problem with this is if a change happens in one of these channels of communications and invalidates a whole bunch of these test cases that talk through that channel the only thing that we can do to fix that is go to each of these individual test cases and fix up those test cases that's going to be complicated because they're going to be complicated bits of code because they're they're worrying about two different things if you if you care deeply about design and separation ideas like separation of concerns we're conflating concerns we conflating what it is that we're trying to express with how it is that we're interacting with the system and the test and that always adds some more complexity so what do we do in a problem like this in software in software engineering well we like introduced a level of indirection and raised the level of abstraction so here he's a bunch of channels that representing different communities of users and we provide a device driver if you like they stub a channel and abstraction that represents that concept in the terminology that they that fulfills the needs of test cases now if this system changes and invalidates the assumptions in the test cases we only have one place to fix it because we because all of these things come through this one channel of communication we can fix it in one place don't worry I am going to go into more detail of what I mean by that as we go through what you tend to find as you as you make this kind of abstraction is that you find that you even now got this placeholder for infrastructure you've got you're not just talking about test cases you're talking about some test infrastructure there are some supporting design ideas and tools and facilities that enable this kind of thinking this kind of abstraction and we're going to talk more about these things over time these little green blobs that we have different kinds that are going to help us achieve this level of abstraction and maintain it so the the idea is is that we've got this test infrastructure that's shared between the test cases so you can you can go and touch these things you can go fix problems in one place we want to we want to we want to think about these things in terms of you know the behavior of the system so here's you know here's a list of things and just to confuse you some of these things are wrong I don't think every test should control the start conditions that's a great starting point for unit testing and it's a poor starting point for this kind of functional testing because usually starting up the system is expensive we'd like to be able to maybe share out starting up the system and maybe certainly if it's a multi-user system we'd like to be able to have lots of difference we don't like to have a start system at once and then have lots of different tests run against the same system that's going to have some implications of its own but we'd like to be able to separate those two decisions sometimes we want might want to start the system for one test case but sometimes we don't so we'd like to be able to have the choice we want the test to be a rehearsal for production release we'd like by the time a release candidate gets to the point of deployment into production that's a non-event because this release candidate has already been deployed using this version of the deployment tools and this configuration of the infrastructure many times during a journey and validation through the deployment pipeline if we do these kinds of things what this leads to is it gives us this nice opportunity for in future speeding up our testing feedback cycle by parallelizing things and sharing the start up overhead as like as I mentioned let's move on to the the next in our list so we'd like our tests to be isolated from other tests now one caveat here is I spent most of my career working in kind of multi-user systems of one form or another if you're writing software for for an individual person that's dedicated to them which is probably probably getting increasingly unusual these days but if you're writing software like that some of these these bits of advice don't quite fit but for the rest of us this this these ideas I just kind of work I think so let's start thinking about test isolation III think that any form of testing in whatever domain is evaluating something in controlled circumstances and so the control part is important isolation is important in the contexts in multiple levels we want to be able to isolate the system under test we want to be able to just be testing the stuff that's the boundary of our responsibility we want to be able to isolate test cases from one another as I said before we'd like to be able to run many of these in parallel and not have them bump into one another and we'd like to be able to isolate test cases from themselves so we can run the same test case over and over again I think it's useful to think in these terms if you don't do this what tends to happen is that when you start to scale up and get faster feedback cycles you find all sorts of resource sharing conflicts and tests running into one another when you try to parallel eyes and that starts to make it more difficult to speed up that's a very common attribute of people that haven't thought about isolation ahead of time my things stopped working so let's start with thinking about isolating isolation and here's one example of what I'm talking about it's this is this tends to be very common in large enterprises let's imagine that we are working on system B and system B is one of these systems that there's other system upstream and another system downstream it's in the middle and what what's often recommended in these sorts of examples is you've got to test the whole thing in two end you've got to evaluate all of these things together now there's a problem with that if I want to be able to precisely specify the state that my system is in in order to be able to test it and I'm only doing that via another system I can't I can't be precise enough there are a whole raft of different kinds of scenarios that I cannot simulate by going through an external system first if I want to be able to simulate system a sending me garbage or the communication channel of the systems C being down I can't simulate those I can't test my system in those in those sorts of scenarios if I'm testing the whole thing in to end so I haven't really got a very good clear way of getting the system under test into the state that I want it to be in worse than that if I'm working on system B I'm not to be an expert in system a and system C and so the degree to which I can exert control even if there are things that I could do through there you're going to be limited based on my understanding we've got to compartmentalize our understanding so that things can fit into our heads so this is a problem it means that the system on the test is not in the cities not in a controllable state when we when we doing this kind of testing so I think this is an anti-pattern this can't be the basis for a solid basis for effective acceptance testing what we'd really like instead is something more like this we'd like to be able to have our test cases as close to the system under test as we can and then capture output some verify that we're getting what we expect we'd like to be able to simulate all of these different kinds of scenarios we'd like to be able to the inject crap data and and and and simulate these you know communications failures whatever it is that we care about we'd like to be able to take control we'd like to be able to you think about kind of putting probes around the system under that under test which means that we got to be very clear about where the boundaries up for our system are now the problem is is that when organizations say that what we'd like is you to do this is that they're worrying about these bills they're worrying about those interfaces changing and that's a real concern it's a it's a real issue one of these strategies for this that Mary talked about in the keynote talk here is make sure that those interfaces are loose based on loose coupled protocols so you can use a messaging system and that gives you a little bit wiggle but still it's it's a real problem so what do we do to verify those interfaces I think that what we'd really like is that we'd like a series of tests each focused on the individual systems as we just described if we do if we're still doing this from the perspective of system B though what we'd like to know of external system a is does it fulfill our expectations of its protocol of communication with us and when we take that focus the number of tests that we need to run is actually quite more so we can define some tests that say is the interface still the same and maybe even we can go as far as to give those tests to team a and they can run their tests as part of their continuous integration infrastructure and if they make a change that invalidates our assumption of their interface they know that we've now got an integration problem this gives us the the facility to do all of our careful thorough detailed testing getting the system into under testing to precisely the state that we'd like you to be in but also some defense of the interfaces between us I've used this strategy many times in turn including with external third parties and that's been enough for me so far I haven't found it haven't found a problem where that that's that's that's that's that's caused a difficulty using that's that strategy so as I said before on the whole I'm kind of working from the assumption of multi-user systems we want to be able to isolate test cases we want them do we want to be able to run lots of these tests and what we'd really like is you like to be able to start the system at once and run they then run lots of these tests in order to do that we need to isolate the test from one another we can't afford them to share resources we don't want them to be writing to you know the same files or the same data sets or the same records in the databases or whatever it might be and that's kind of tricky when we think about a whole system that we're evaluating one of the nice strategies to do if we are talking about a multi-user system is something that I call functional aliasing so you use natural boundaries in the system to isolate tests from one another this is another one of those kind of 8020 kind of project actually probably more like in 95 5 but for the vast majority of test cases you can use this kind of functional isolation and you get no problems and what I'm talking about so imagine that we were testing Amazon so if we were testing Amazon every single test would create a new account and a new book or product and then if we were testing eBay we create a new account and a new auction for every test case and if we were testing github we create a new account and a new repository for every test case you get a kind of weird profile in the system under test because you have lots of repositories or lots of books created and lots of users created but it's a really nice way of isolating the systems really simply from the test cases really simply from one another if you want to do that then we you know that there's another step we like to have these repeatable results we'd like to be able to run the same test case over and over again and so if I run the same the same test twice I should get the same result here's a cheesy example so here's somebody trying to try and write a test case to buy my book so we've got a store and we're creating a book in the cut scope of the test case then we're going to place an order for the book and then we going to assert that the order was placed now the bit that I'm worried about here is this bit so creating a book if I run this test case and I just read that literally and end up somewhere storing some information in the system and the test that represents that book like that then the next time I come to run that test case I'm now in a different state now when I run that test case the book already exists last time I just created it now already exists maybe another to the test has changed the state that that book is in maybe it's sold out or something like that because of a test case that was run and we're going to get a collision we're gonna get a problem a flip flaky test case here so that's a really simple strategy now so instead of doing it this way what we do is that the in the test case in the instance of a test case our infrastructure remember our test infrastructure our infrastructure scheduie read this as a request rather than an instruction it says ok you'd like a book it's called continuous delivery but you don't really care that it's continuous delivery I know you don't really care so I'm going to make up a different name and I'm gonna map it to the name that you want in the scope of this test run but inside the system under test it's going to be different and you don't care so the next time I run I'm gonna get a different name and now I've got test isolation so just using the aliasing fill it facility facility for all of these functional entities allows us to run these test cases in parallel and separate from one another it's trivial trivially simple but it's really quite effective so a good starting point use functional isolation entities always alias them always just mangle the name in some way so that the southern names going to be unique for every text execution then you can run the same test over and over again and you're not going to get any problems so we'd like our test to be repeatable we want to be able to use testing as a falsification mechanism we'd like to be able to as soon as a test fails we're going to discard the release candidate so we want to be able to trust the tests I'd argue that the tests of any automated tests of any form are actually only valuable when they're failing you can have as many tests as you like and if they're all passing you don't know the test might be rubbish you might miss the key thing what they're only really conveying information like you get a probability that maybe you're okay if you've got lots of test cases that are passing but you only really know the state of your system when a test fails and tells you that your systems not good enough so we need our test to be reliable we can't afford flaky tests we can't we can't afford tests that sometimes work and sometimes don't what that means is that we've got to control again be very precise and very specific about the control of our system let's imagine that we've got a system like this we've got we got a our system under test and we've got some external system that we're talking about we were talking before where we kind of these end-to-end tests which we don't like very much and let's imagine that we learnt to write software not from Visual Basic for dummies but for something else and and we didn't just tightly a couple those two things we have a local interface between here that an abstraction B of the communication between this system and this system in some way further beyond that we've got the communication channel between these two systems you know that the rest API or the the sockets and whatever it is that the the mechanism for communication between these two systems what we'd like to be able to do in the scope of a test as we've said is that we'd like to dump the external system we don't really want it to be there it's just clutter and it doesn't give us sufficient control it doesn't give us the doesn't really give us repeatability it doesn't give us the ability to really get the system under testing to the state that we were in to do and you that we want it to be in and evaluate it in control circumstances so what we can do then through configuration we can use the same real world communication technology but we can we can fake it we can stub out those external communication pieces they're written again remember these these probes that we're putting around our system every point of external communication we want to fake it we want to plug in a a stub a a a a fake version of the external system so that we can fake inputs and collect outputs and make assertions on them what tends to happen is we're doing this so here's that picture again he's he's the public interface to our system and he's the scope of our system under test we've got our test infrastructure some test cases what's really nice for these stubs is that we want to plug these into our test infrastructure these are a distributed part of our test infrastructure we'd like to be able to express ideas in our test cases that say I expect this outcome I would like the external system to provide this input and we'd like to do all of that from through it from a test case in an abstract way and hide that complexity through some bat-channel of communication between the test infrastructure and they the test stubs I've been leading you in the direction of this already really I think this is crucial using the language of the problem domain so helps us solve a lot of these that problems in particular it gives us the right level of abstraction to be able to express these ideas in a solution free way it allows us to define these executable specifications without describing how the system works so it also helps us create these test cases very quickly it allows us to read and understand the scope of our test I have a client I work as a consultant advising people on how to do some of these stuff and continuous delivery things in general I have a client in the Netherlands who have kind of taken automated functional testing quite seriously for several years they have a very large body of automated functional tests and their words they are horrible they are complicated convoluted lengthy it takes the person that wrote the test half an hour 60 minutes to understand what the test is doing so they've got all of these tests they don't know what the coverage is they don't know what they're asserting they don't know whether it's good or bad even so the ability to understand what a test is asserting very clearly be very precisely is important thinking of these things as specifications with the behavior of the system provides us with this right level of ascription abstraction and we can start designing languages that allow us to express those ideas here's an example I hope you can read this so this is a real example from the system that I was talking about this is in the in the sphere sphere of financial trading and in this case we're selecting a deal ticket which is a mark and the instrument represents a market in which you can trade we're placing an order of a particular type we're checking to see what the feedback is from that order and we're placing another order and we're looking for the feedback here here's another test case this one's going through the fix API that I mentioned before and we're replacing a master order to get the the marketplace into into a known state that one instruction is creating the marketplace starting in or Avani defining some users to to trade in that marketplace and putting a bunch of prices in the marketplace that the subsequent tests can trade against so remember what I said about the ease with writing these tests and getting the system under test into the state that we'd like you to be very quickly and then we doing some other interactions as I said before each test test case he started he's getting the system into into a condition that you would like to be in so we're creating an instrument and some users that are gonna be able to trade and those sorts of things to get the system in to test this is the next level down so these these is this this is the this is the test in for a structure that the test cases share and are and are talking to this is a place order example and you can see one of the one of the properties of this is that we're using lots of optional parameters so if I don't care what the order that I'm the detail of the order that I'm placing I can provide a place order with I think no parameters I can just say place order and it will make one up for me that will work if I don't care the detail of the order if I want to be precise about the order I can specify every attribute of the order here's another version of that this is for the sorry no I I haven't got the other version so for other api's that can be more complicated actually what I've just shown you was kind of an iteration of where we're at and we realized over time that even at that level we weren't really divorced enough from how the system was working we've got stuff in here that was too specific so we were talking about the trading user interface and the fix API and we realized that we there was no need to do that we could abstract further we ended up with test cases like this this single test case expresses a real desired behavior of the system we'd like to be able to place an order in a particular market of a particular type and we'd like to see an execution report a response for the behavior of that order this one test case works on the fix API the public website and the public API that people could use but could use to write their own but well this is one test case expresses this requirement and it can be fulfilled in three different places the same client I was talking about before in the Netherlands is using this technique there they're rewriting a legacy system and then migrating towards a microservices approach they are rewriting their acceptance tests in this kind of form and they've got two different versions the older version of the system and the new version of the system they can run the same specification and assert that they get the same behavior out of both systems that's quite powerful and demonstrates really loose coupling between the test case and the system under test you can now imagine the system under test is able to change quite dramatically without impacting on the nature of this test case it doesn't really matter the nature of the order it doesn't matter whether you're placing order means filling in some details in a form or clicking on a graph doesn't matter this still makes sense we'd like to be able to test any change as part of our development process we don't want to just test the easy things or just test the happy paths we want to test all of the behaviors of our system so I'm just pulling out a few few different cases Lmax where I worked we tested every attribute of the system we tested the performance characteristics they we would we would selectively kill bits of the system and check that that system didn't lose any data all of these kinds of things but I'm going to focus on just a couple of things so the obvious one that tends to trip us up is time time is one of those dependencies that tends to get in our way quite a lot and I think there are two strategies to dealing with time in in automated tests the first one is if your system doesn't care very much about time ignore it blank out any comparisons of time fields and just ignore those comparisons that works if your system doesn't matter about time very much the other approach is to take control of time so what do I mean by that so here's the first approach ignoring time the nice advantage that this approach has is that it's really simple you just whenever you're comparing data between two different implications of the system you will you ignore all of the time fields the trouble is that it can miss it errors and it prevents your testing some interesting and complex scenarios if your system does care about time and that's where this comes in so testing by taking control of time it's very flexible it's extremely powerful it allows you to run long-running scenarios in short periods of time you can simulate all kinds of different interactions with time and see things in different the downside it is slightly more complex in terms of the infrastructure and the setup so let's just dig into that in a little bit more detail again I apologize for the slightly cheesy example so let's say that somebody wants to borrow my book from the library and they're going to assert that the book is overdue he's not overdue and then they're going to time-travel here's the bit that we're interested in they're going to try and travel forwards one week and then they're going to assert that the book still not yet overdue they get a time travel forwards four weeks and now they're going to assert that the book is now overdue so clearly it's the time travel bits that are the interesting bits in this scenario so here's our system on the test and here's our test infrastructure surrounding it and typically in this kind of scenario if we've got some some notion of time in our system somewhere there's going to be something that looks vaguely like this in the code of the system we're going to have a quarter system get time or its equivalent in whatever language or infrastructure you're using again we solve this kind of problem by introducing a level of indirection so instead of you calling directly into the system we put our own clock in the way so instead of asking the system for the timing and ask a clock for the time and every time the system asks for the time he asks a clock now we can start messing with it so we could introduce a clock like this we could have a clock that by default he's talking to the system clerk and he just goes and gets the time from the system but we can also cheat and we could set the time from from outside and it always returns that time whatever it is we could plug that into our test infrastructure so that we could do the time travel instructions and say ok so we'd like to do this we'd like to time travel we're going to interpret what that means in terms of the time and then we're going to set the time in the system and the test as I said this allows us to evaluate long-running scenarios in the finance industry where these examples some of the examples were taken from we often have kind of three day cycle sometimes longer cycles and you don't want to run a test for three days to find out if that works but one of the slightly amusing things at Lmax was we we talked to either on fifteen or twenty third parties as part of our whole enterprise system you know did other trading organizations clearing houses all of these sorts of places and every single time there was a daylight savings to change in the clock one or other of those third parties would break and we never ever did because we tested it we tested that we tested daylight savings change scenario in our code with with this with this time-travel approach so if we want to do these kinds of these kinds of different these different sorts of tests we probably want to start they probably have some different characteristics remember what I said at the start on the hall for the majority of tests we'd like to be able to start the system at once and then run a whole bunch of tests against it to share at the cost of starting up the system but that's not going to work for some of these tests if you're time-traveling you don't you can't be time-traveling in one direction and another test time traveling in another direction and another test not wanting to time travel at all at the same time that's going to mess things up so in no scenarios you probably want a dedicated version in the system for those time travel tests so you could tag that with what we did was that we tagged the tests so we had time travel tests and they're now test allocated we would look at those tags and allocate them in two different two different hosts we had destructive tests tests that were destroying bits of our infrastructure or bits of our code to see how our system stood up and again you probably don't want to be sharing those environments with regular everyday tests or performance tests or anything like that and then we had tests that depended on specific bits of hardware and so we could go and look for the hosts that had that bit of hardware and allocate the tests to their here's a nice little animation so the guy that wrote tests allocator this is quite a crude version modern versions are much more big and complex than this book this was when he was developing the system the test alligator and he did this visualization to understand what's going on over here is the parallel test this is we've got one version of the system and a whole bunch of test hosts running different test cases against these over here we've got some of those destructive tests there that each test has its own version of the system and can kind of kill differently to the system and evaluate what's going on and over here we have a bunch of time travel tests each again has its own version of the system and it's running those tests against that version of the system and taking control of time in that context we'd like our test to be efficient we want to be able to run tens of thousands of these tests I was talking to a colleague of mine except that we worked at the place at different times from El Max yesterday and he said that when I left del max they had about 15,000 of these acceptance tests running giving results in about 40 minutes when he left which is a year or so ago I think they had thirty thousand of these tests giving results in about 30 minutes was they'd increase the hardware and parallelized a bit more I said that so you can run tens of thousands of these tests and that's what we want to be able to get to to get these fast feedback cycles here's an example of one one way of looking at efficiency if you're testing if your production environment looks like this and the typical interaction through the environment looks like this then you want your environment to be production like it doesn't have to be a production clone it has to represent the key attributes of the system so maybe your test environment looks like this maybe it's a bit more complicated than this to get you faster feedback but it's going to look something like this if some of your interactions have a particular unusual bit of hardware or something like that and some interactions look like this maybe your test environment looks like this what you're trying to establish in here we want to be able to run your software in lifelike scenarios in production like environments so you can evaluate the deployment the configuration the interaction of all of the different pieces as well as just the behavior of the bits of source code that we wrote when we're writing these these these high-level abstract test cases in our domain-specific language we don't want to be worth worrying about the hard computery stuff we certainly don't want to be worrying about how the system's delivering the behaviors we just want to be able to express the ideas of the behaviors but beyond that we we don't want to be worrying even worse we don't really worrying about really complicated things like concurrency so from the perspective of each step in our domain-specific language we want that to be synchronous we want that to be a single step I completely agree with what Mary said in the keynote about the importance of asynchronous software design but this is one place where we want synchronicity we won't be able to make a step in our test case that's complete and be valid and repeatable and then make the next step so if you are looking for a if you are working in an asynchronous system and you want to fake this within your domain specific language again you've got this infrastructure layer that gives you a place to do that so here's an example this is this is the DSL layer this is the layer below the test cases so we've got this is the shared thing for placing orders let's imagine we're going to send an asynchronous place order message we're going to pass the parameters and then we're going to wait for a confirmation or fail on a timeout so we can eat look synchronous from the perspective of the test cases but it's actually asynchronous under the hood we can hide quite quite complex interactions and behaviors this way within the layer of abstraction that a DSL presents us with and you can as you do this others of those little green blobs in the in the infrastructure that tends to go you tend to build your your own little domain model it helps you build and grow some of these ideas in share context between different steps in the test case so I so I think in most systems my even-even very asynchronous systems it's true to say that there's nearly always a natural concluding event to any interaction if there's not maybe they should be maybe it's worth looking at the design of the messaging in your system however if you really really really can't find a concluding event then a much worse approach but still slightly acceptable is that you can do a poll and time-out mechanism to go and look so easy as it arrived in the database as it now I'll go and come back never look in a few seconds as it right in the database yes pass move on that kind of thing what you should never do is this I see this all of the time I go into clients and talk about helping with their testing strategy and their code is littered with weight statements in the in their test cases this is this is like crack crack cocaine foot foot for testing this is a complete anti pan it's going to make your software your tests slower the best case is that your tests are going to be slow because every single time you're going to wait maybe returned in a microsecond you didn't have to wait you could have saved that wait time worse than that what mostly happens though is that you end up introducing a whole different raft of race conditions you've just moved the game and you start playing silly games from the tune all of the different weights so that you get you avoid the race conditions you are never going to do that get to the real problem you'll eliminate you get rid of this and start doing this kind of stuff instead if you do all of those things the next step in continuous delivery becomes easy the points at which your feedback cycle starts to slow and you need to scale up to get faster feedback again become simple so here's our artifact repository we got a release candidate that ends up in there from our commit stage a deployable artifact whatever that might be our acceptance test environment when it comes free is going to pick the newest one of those deploy it to a annex shared acceptance test environment it's going to spawn off a whole bunch of test hosts to run all these different test cases and it's going to evaluate that release candidate and it's going to feedback the results and tag the tag the release candidate in the artifact repository with the state and that becomes simple when you've done these things if you've got shared state lots of white conditions conflation of what we've had all of those things mitigate against your ability to do that scale up so I'm kind of in the wrap-up mode now so let's just quite quickly we through the summary of this so don't use a record and playback systems they are poison don't record back don't just use production data it's a at best it's a sample of our behavior likely the things that are going to trip up your software I'm gonna be in your production run you want to simulate those things you want to think about all the awkward cases you want to simulate failures don't just dump production data into your test system either it's it's like climbing and mounting with a rucksack full of rocks did you just walking around with this this this burden of data all of the time don't assume that buying testing products off the shelf and the advice that they give you is going to lead to a good testing strategy my experience of most of those things is that they don't lead to very good testing strategy you need to be very specific you need to think very clearly about what you're wanting and I would argue from my perspective as a proponent of continuous delivery the prime directive in this kind of approach to automated testing is what you want is fast feedback that is more important than anything I would rather dump a test that took too long to run then slow down the feedback cycle don't have a separate testing and QA team I think that tests at separate testing and QA specialists are extremely valuable on a team but they should be working collaboratively intimately with development teams fine-grained collaboration you don't build mini waterfalls where the development team working till the end and just hand things over to the two QA team even if they're sitting together you want to make fine-grained evaluation of the software all the way through don't let every test in it the app in the application unless your application is blindingly fast to start up do it share out the cost don't include systems outside the scope of your system be very specific about what the boundaries of your system are and test to those boundaries and don't put bloody white statements in your tests hoping that it will sort you out it won't it will just make your tests flakier and harder to figure out and slower so I don't want to finish on a downer so let's I'm sorry we're people trying to take pictures and I'm over running so I'm not I'm going to go ahead with these do you ensure that developers own the test so that they complete the feedback cycle test what not how use the DSL think of your testers executable specifications making acceptance testing part of the definition of done keep tests are isolated from one another using the tricks that I talked about here keep your tests repeatable you won't be able to run the same test over and over again and get the same results every single time use the language of the problem domain really do try the DSL it's easier than it sounds to just evolve these things very quickly do stub external systems to define the boundaries of your system testing production light environments rehearse deployment configuration as well as the behavior of your system make instructions appear synchronous at the level of each of the test case statement and test for any change test every change in your system and finally make worry about the performance of your tests worry about the feedback cycle worry about if your test is taking hours that's too slow if it's taking minutes can you make it take seconds instead really could take take treat test performance seriously can you learn this same thing more efficiently if you can do that with that I'm done thank you for your time I've slightly over run [Applause]

Info

Channel: GOTO Conferences

Views: 10,753

Rating: 4.8579884 out of 5

Keywords: GOTO, GOTOcon, GOTO Conference, GOTO (Software Conference), Videos for Developers, Computer Science, GOTOber, GOTO Berlin, Dave Farley, Continuous Delivery, Software Industry, Software Engineering, Software Development, Delivering Systems, Acceptance Testing, Programming, Coding

Id: SBhgteA2szg

Channel Id: undefined

Length: 53min 43sec (3223 seconds)

Published: Fri Dec 02 2016