John Hughes - Building on developers' intuitions (...) | Lambda Days 19

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Great talk! I reproduced the examples at https://github.com/jmitchell/developers-intuition.hs

πŸ‘οΈŽ︎ 9 πŸ‘€οΈŽ︎ u/mekaj πŸ“…οΈŽ︎ Mar 14 2019 πŸ—«︎ replies
πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/bhrgunatha πŸ“…οΈŽ︎ Mar 15 2019 πŸ—«︎ replies

In a previous talk i saw him do integration tests with quickcheck, where it shows a mix of, say, a queue. It would push pull with random data and see if it breaks. I never found any more documentation or examples of this. It was in erlang though. Anyone know if that is/was possible?

πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/Blackstab1337 πŸ“…οΈŽ︎ Mar 15 2019 πŸ—«︎ replies

His talks are always interesting and well presented.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/julesjacobs πŸ“…οΈŽ︎ Mar 15 2019 πŸ—«︎ replies
Captions
so I'm going to talk about property based testing today and I'm going to use the Haskell implementation of quick check my example so just before I start let me ask you who would consider themselves a regular Haskell user okay many but minority I think so I hope that I will not use any very strange notation but if there's something that I could get to explain them and do just ask me so property based testing quick check I need to introduce it here's the hello world example it's a testing tool in which instead of writing individual test cases we write general properties of code such as this one this is a property of the reverse function on lists and we read it as for all list X's if you take X's and you reverse it and then you reverse that again you end up with a list that you started off with so this property says do the double reversal and then make sure it's equal to X's this triple equal sign it's like an equality test but it's for use in tests so it generates an error message if these two sides are not equal and the property itself it's just a function but you can pass any list to in order to run a test what we do is in the Haskell repo or in a main program does matter we call quick check and we pass it this functions and arguments and quick check then generates by default a hundred random test cases and we run the test region so this shows you what happens in the case when the property is true in the tests pass to show you what happens when it fails I have another property prop wrong and this one says that if you reverse a list once you get the list you started off of course that's not true so what happens when we test it we give this function quick check is that it reports a test failure and then it shows us the counter example the failing test so the output here contains one zero that's the value of X's for which the test failed and the second line is the error message that generates by that triple equal sign and it shows what reverse return which is 0-1 and what it was expected to be which is one zero and of course they're not equal that's why the test dates so notice here that the example we've got one zero the failing test is very simple and in fact we'll always get either one zero or zero one year of course that's not the first failing test case that quick check that quick check generates some random list probably much longer that wasn't its own reversal so why do we get this very simple test case because every time a test fails quick check then tries to shrink it it tries to simplify it as much as it can just as you would simplify a failing test you know if you'd found one some other way until it ends up with a minimal test that can't be simplified further but still fails and this shrinking process ends up with simple test cases that fail where everything is relevant to the failure and a large part of what makes property based testing useful is that when tests fail you get these very simple counter examples to debug shrinking is going to be very important for today's talk but I'm not going to talk about this kind of vanilla example I want to tell you a little story now and this is based on a true story but I have changed some of the details and will will let term will change the names to protect the innocent so I want you to imagine that you're implementing a cryptocurrency and you're doing it in Pascal there are people who are really doing this and what what are the some of the things you would have to do but one of the things that you would probably want to do is to define a type of coin amounts but here's a hassle type definition and the value is going to be things like coin to coin three coin for a coin amount contains an integer but it's different from an integer why have a separate type for amount of coins well because you can't use just any integer as a number of coins this is generally speaking a maximum number that you are allowed to have and this max coin Bauman in reality would be billions or trillions but just for the sake of illustration I'm gonna say let's suppose that the maximum coin bound is one note so now of course some coins will be valid some coins won't so let's suppose that we have a validation function that just checks that the number of coins is between zero and maximum and let's think about what operations we're going to want to have on these coin conjugates well one of the things we're going to want to be able to do is to take two coins amounts and put them together right so we have to have an ad punch and here is the definition of the add function so what's the difference between this and just adding integers the difference is that you might overflow so addition might fail and I've represented that in this code by returning Pascal's maybe type so a check does the sum of the coins overflow if not I return just that's represent success and a coin containing the sum otherwise I return nothing to represent thing okay this is very very simple code but it still has to be tested right and I want to test it with quick check now how would you test this code well maybe it's not obvious what properties to write but I'll bet you can think of some unit tests in fact I'll bet you can think of two unit tests in particular that you would like to write for this code is what I think you're thinking of one unit test the first one just tests the normal case when you add two coin quantities together so I've written down here if you take coin two and you add it to coin two that's in thinks application the add function the result must be equal to just point four it's that that tests the successful case and of course I must also test the overflow case and check that ad really returns nothing when the wrong zone so that's what the second test cars does I take a coin with a maximum coin value and add one to it and of course that must return okay now once I've defined these two test cases of course I can I can actually pass them two quick check and it'll run them there's only one test in each case and of course they pass they also by the way give 100% coverage of the code for that so if coverage is your goal hey we're done however we would like try properties instead now the very tempting thing to do here is to say well I've got two unit tests why don't I just generalize each unit estimate and we can do that so here I've taken the two tests and I've written two properties okay so the first property tests are normal case app addition and it says that when you add the coins the results you succeed and you get just and the coin containing the sum a and B but of course this property only holds if at some a and B is within the correct range it does not overflow and that's what the line in red is capturing that's a precondition of the test and the way that quickcheck implements this is it generates test cases without taking the precondition into account but if it finds that this precondition is not satisfied then it discards the text it doesn't write so we only run test cases that satisfy that leakage and the second property does the same kind of thing it says that if we expect an overflow then we add the two coins together we must get nothing okay I can't quite run tests yet I have to write a generator for coin values this is how we write a test case generator in haskell quick check we give an instance of the arbitrary class and the arbitrary definition there that defines my generator and what a valid values the coins are there any integer in the range zero to a million right so I've just chosen a value uniformly from that range and tagged it as a coin that's oh by the way there is a risk of course that I might screw up when I write a general answer so whenever I write a generator of this sort I will also write a property that just checks that every generated test case actually is valid okay so this property at the bottom going to be given coins generates by this generator and it's gonna check that they really are valid and that's just the way of making sure that we stay say given this stuff I can now run the tests and instead of just running two unit tests I'm running now 200 tests 100 normal cases and a hundred overflow cases so this is so far this this is what happened in real life and but you can see here oh by the way I still have a hundred cent car of coverage though no that's great you can see that lots of tests are being discarded but in each case there I found a hundred tests that satisfied my pre condition and I just discarded 96 others so that means that the testing is not as efficient as it might be and so the developers who wrote this code thought we don't want to be discarding so many tests so let's instead oh yeah why are we discarding them we're discarding them because of this precondition that's written there so let's instead define a new type of normal test cases okay the type definition just says that a normal test case contains two coins but we'll define this type so that there's an invariant that the two coins in this type always make up a normal test case and that will mean that the precondition will always be true so it won't discard tests anymore and how do we do that well we have to write a custom generator here we are there's a bit more code here what's this doing well first of all it's choosing coin a arbitrarily using the previous generator but then it's choosing B very carefully to make sure that the sum of a and B will not overflow and then just packaging them up into a value of the normal size so once I define this generator for normal types and rewrite the property to expect a normal type then what I test the property all the test cases will pass the precondition and testing will be more efficient and so the developers did this for the normal case and the overflow case so let's just think about the code that we have to write to do this we have to define a new type this normal type for normal cases we have to write the custom generator there it is where we choose B very carefully we have to do the same thing for the overflow cases and a different custom generator where I've chosen B differently to ensure that I get an overflow by the way I'm now generating test cases without using the first generator I wrote to generate the second point that means no second coins might not be valid so I also have to write some properties to check that both coins in each case actually are valid points you might think this code is so simple it's not necessary to write those properties well I'm glad I did because when I tested the overflow one after nine hundred thousand tests I found it can generate this test case in which the first coin is zero and the second is a million and one how come I've generated an overflow case with a million and one which is not a valid coin in well let me tell you this if you're going to generate a pair of coins which when added together must overflow you better not choose the first one to be zero because if you do you paste yourself into a corner there's no valid coin you can construct that when added we'll give you an invalid point and I of course I haven't done that so it's important to do these these tests on the generators and so if you think about it we I started off with a unit test I generalized that into a property which is quite easy to do but to make it efficient I had to define a custom datatype I had to write a custom generator for that datatype I had to write tests a custom validator for that data type to make sure that I'm still generating valid test data it's quite a lot of work and I had to do this not just for one unit test but for both and if I was starting from more unit tests that my intuition tells me I could write well how to do is again and again and again it's a huge amount of work and what's more I'm always going to face this question where are the generators that I write correct they're kind of tricky code I've now got many copies of my property are they consistent if somebody changes the code and changes one property will they remember to change the others I've got a lot of different generators for different cases do they cover all the cases together or are there some that are covered by no generator to figure that out I have to reason about a bunch of different generators and I don't like reasoning I like to let quickcheck do my thinking for me but here I have to do awful lot of things so I've written a lot of code whose correct this is not obvious and worst of all none of it is reusable next time I have an idea for a property I could start again and begin so this really wants to make my make me it makes me want to tear my hair at it just seems the wrong thing to do so what should we do instead one property to rule them all I want to take all of our unit tests and combine them into a single product so here's what I suggest for testing addition let's just write one property no preconditions that adds together any two coins and then checks to see if I we're just just add the numbers would I get a valid coin as a result if so then I should get just that coin if not then I should get nothing so this this code clearly can test any possible case it covers all of the cases and you get 100% coverage from this - but it's much much simpler okay but you may be wondering now have I really captured your intuition those two Union tests though it was obvious we did right well sort of but here's a question this property clearly can test both combinations of normal cases and overflow but does it this is a possibility because there's only one property now that all the generated tests were the normal case or all the generated test will be over focus we don't know anymore so that doesn't quite correspond to the intuition so what should we do should we go back to multiple properties know what we should do instead is label the tests this is something a quick check is supporting for a long time so I've added that red line of code to the property just saying I want to label the test case with a string computed by this summarization function what does summarize do what it just looks at the son of a and B and it decides are we in an all case or the overflow case and then returns the corresponding string string when you label a test case like this then quick check will display the distribution of tests we actually ran when we finished testing so in this case more or less 50% of the tests are normal the Enduro of locate since that's probably all right so I'm happy so this is the way that one should use one's unit test intuition to label test cases and see how often each unit tests idea is being used okay do we have any more intuition I wonder about the tests that we would like to run on that well how about this the add function essentially enforces a boundary doesn't it a boundary between the normal cases and the overflow tastes is that boundary in exactly the right place if you were writing test by hand you would probably write some tests that fall either side of the boundary just to make sure but it's in exactly the right place correct was that inclusive so how do I use that intuition now to improve my testing well it's a unit test idea I saw I won't change the property I just changed my labeling function let me label all the test cases that are within two of you know the boundary has boundary cases unless just see how often they get generated then [Music] Whateley i ran ten thousand tests there was not a single boundary case oh dear so intuitively it's important to test this boundary and my property despite running a huge number of tests has entirely failed to do so so my testing is nowhere near as good as I thought it was so what can I do then well let's think about how we are generating coins I put this generator up I said you know the rate zero to a million we'll just do something in that range they'll be objected but if you were writing tests by hand where what are the input was in the range zero to a million this says any number is as good as any other is that how you would work or are there some values that you would be careful to make sure you test so common testing practice is to say well if you can input a range of values make sure you test the first at the last one make sure you test values at least close to the boundary the extremal cases this generator doesn't do that this generator assumes that one numbers as good as another so maybe what I should do instead is generate like this so here what this code does is first of all I choose a non negative number N and there's the default picture it generators choose very small numbers so n will be a small non-negative number and then I'm going to generate my coin by choosing uniformly between three alternatives the first alternative is just return this small number this will give me values close to the lower end of the range the second alternative is going to be to return the max point value minus that number this is going to give me test bases close to the upper end of that range and the third case I'll still do what I was doing before so it's also interesting perhaps to try any number in the whole range so let me just include that as a possibility so this is problem be a better way to generate coin values anyway but what's more I want when I add two values together I want to get more boundary cases well when I add about a coin that's been chosen by the first alternative so it's small to a coin chosen by the second it's near the end of the range it's a good chance that they're going to fall near the boundary right so this is not only a better generator probably four coins considered in isolation but it should give me better results from the addition property and sure enough if I run tests for the addition property would you know the very first one fails so the code is wrong and the counter example here is it comes when you add a coin with value 1 million to a coin with value 0 and the third line shows us what happens the left-hand side is the actual result from ad so ad adds these two coins together and says I fail the right-hand side is what we expect which is that you succeed and you get a coin containing 1 million why did that happen here's the code I showed you earlier look at the red bits in the valid coin function I assumed that a valid point can have any value less than or equal the max coin bout in the addition function I compare to see if a plus B is less than the max point out so this an off by one error here now it doesn't matter which choice you make but you have to make one choice and be consistent about it and so I could fix the add function by changing that to the less than or equal and if I do and run the test again then we'll see that now they pass and now I'm getting a little over 4% of boundary basis it's 4% enough I think it probably is in any large number of tests there'll be a lot of boundary cases that will test pretty thoroughly but the boundary has been in the right place so here is the idea that you take your intuitions about what unit tests would like to write don't throw those intuitions away just because you're doing property based testing but don't write single properties for each test generalize them all to one grand unified property but test everything but for every unit test idea turn it into a label that you can label these test cases with once you've done that then you can look at the distribution of tests you can see are we actually testing each of these ideas recently often and if not you can tune the generation in a way similar to what I've just shown you so that you get each thing you want to test being tested reasonably on so here's a method for thinking about how you develop an effective property the first property I showed you was not effective because of the generator the new property is but of course if it's a method it should be applicable to more than one example let me show you another this is also quite simple I have written a little library for finite maps and here's an example of using it I can convert a list this is a list of key value pairs p1 has value a that p3 has value C like internal list into a map and I'm going to display maps in a kind of set light notation to make my examples easy to read and of course there are a lot of operations on these maps I'm going to test insertion so in this case if I take this map T and I insert the key to with a value B then I should get a map containing he's 1 2 & 3 ok so the implementation by the way is actually ordered binary trees so here's the tree representing the first tree they're the T if I insert two and B into this then I take the new key value pair compare the key to the roots of the tree to go right in this case compared to the key in that that node I go left now I've reached the end of the tree and so I create a new node okay so we know what what we're doing so think of it what unit tests would you write with the insert function anybody like to make suggestion what cases do undercover yeah thank you so the ordering is important I I'm going to focus on that to begin with I'm gonna want to test smaller than the keys in the tree larger than somewhere in the middle I feel I should have a test reach one of those so that's my first intuition so let me start working on a property here so one awkward thing is that when I write a unit test for insert I can predict exactly what the result should be so I can write the conventional kind of unit test with an expected value but when I write a property I'm going to be inserting things into random trees which I don't know in advance so here we see the tree in my the test I showed you and the resulting tree but how can I tell whether the resulting tree is correct or not I don't have to try and predict that value to do so I'd have to influence insertion into a tree and so my test will be the same client code I don't want that so what I'm going to use is a very powerful technique very useful to test in this kind of code I'm going to take the two trees before and after and I could go to convert them into a simpler data structure in this case just an ordered list of pairs so if I take the tree beforehand I'm going to convert that with a two list function into the list containing them one at la and three and see if I take the tree I get afterwards I can convert that also into a list and now even though it may be hard to predict exactly what tree I should get it's easy to predict what list I should get at the end of this I can just use list insertion so we call this using lists as a model for the more complex data structure and you can think of it as using this as a reference implementation but it's just there to judge the correctness of the tree in Flemish and I can convert this into a property real easily here it is so this profiting of insert says if I'm inserting a key value pair into a tree well I should get the same list whether I do the insertion first and convert that to a list or first convert to a list and then use list insertion except for one little wrinkle and that is that list insertion can result in duplicate keys whereas insertion into a finite map can't and so that's why I've thrown in the Delete key function there that we just make sure that I don't get okay so here's my property now let me add some labels corresponding to those unit test ideas when I start off with so I'll just do the same thing I did before I add a line to the property that says label each test case with the result of this summarize function and I defined summarization like this if all the keys in the tree are greater than or equal the key I'm inserting then I'm inserting at the start if they're all less than or equal I'm inserting at the end otherwise I'm inserting and data this is what I get after 100 tests and we can see the 80% or so of generated test or inserting somewhere middle about 10% ER at the end 10% of the star that's probably okay okay so can't be happy well there is a problem or the problem is how do we know that that code I wrote was right I went pretty quickly over it but it's kind of tricky if you're not a regular Haskell er you might not be completely certain but I've done the right thing and what happens if I get this code wrong will I ever find out all that happens is that I'm putting labels on tests but don't really represent what the test is testing I look at my statistics they look fine but that doesn't really tell me anything because the statistics might be using the wrong label aids so really I would like to test this labeling and make sure that when I label a test middle for example it really is I'd like to see some examples of tests with each label and this is a new feature that hospital quick jerk has not previously had you can now take such a property and just say give me some labeled examples so if I do that I'll get an example of inserting at the start inserting at the end and inserting in the middle and these are just test reported in the same way that we click always does so if we look at a 10 for example it says we're inserting the key 0 and the value 0 into this map that just contains minus 1 and the final line it's the one produced by the triple equal sign it just says when we convert to lists here we are we get the same result on each side all the tests pass so the last line always contains to move out and if we look at these right look at that end sure enough 0 0 is being inserted at the end of the test that let's go look at middle 0 0 is being inserted in the middle of the map that looks good look at that start here is larger what's happening here is this inserting 0-0 at the start from that but this is not what I had in mind when I said I wanted to test in circle at the start this is inserting into an empty map is it really at the start I suppose in principle if at the start it's also at the end but but it's not what I want it so how come how come this is the example I'm given because my labeling function labels this test as being an at start test and how come this gets reported because quick check reads the Bible like Satan as we say in Swedish this is a say and what it means is that Satan may read the Bible you may follow the rules to the letter but if there's any way of perverting the spirit while still following the letter Satan will choose and quick check shrinking kind of does the same thing right so shrinking said do we need any other elements in the collection the label is that start no throw them away so that means when I see at start being reported I don't know that I'm really doing an interesting test at start I might be inserting into the empty collection ok so how can I fix this well I have to make my rules clearer so let me add another case to my summarization function that says if the tree is the collections empty I'm going to label this as an empty test not an at start test and if I do that and I ask the labeled examples again now I'll get the same test I had before but it's labeled empty that's fine and now I get a real at start text what it's they still aren't any other elements no no I didn't say they have to be different did I and so what do you know safeness figured out that it'll it'll work to put the key in value time or I'm trying to insert in the tree or II and then sure enough my labeling function says this is an insertion at the start but it's not what I meant okay so actually looking at this example it's quite illuminate no it reminds me but there are really two different kinds of insertion this insertion that inserts a new key and there's an insertion that updates the value associated with an old key maybe I want to try both those situations at the start in the middle and at the end maybe I shouldn't have started off the three unit tests but six so let me change my labeling function let me just add some more information to the label that's the red code I'll keep the label I had before and then if it's non-empty I'll look and see is the key on inserting already an element of the keys in the map and if so I'll say that this is an update kind of insertion otherwise if you kind of in search and when I change my labeling this way then that start at start test gets written to the one I saw before that's the first thing on this slide which is testing the update case at the start and the second one which is the test I wanted all along which is inserting a new key at the start and sure enough there's the update and the insertion now really is insert at the start of the list of the collection okay so now finally I think my labels mean what I intend them to me and now I can look at the distribution of tests and if you look at this what do we see well we can see that most tests are inserting in the middle 60% of them are inserting new keys if I were to try and tune this I might try and reduce that a bit and do more updates a very small number of tests are updating elements at the start or end of the list but still you know what at a half percent that means if I run a few hundred tests I'm gonna hit that case this is alright okay now here's something you might be tempted to think you might be tempted to think I've done a lot of work now to classify my tests I've got seven the interesting kinds of tests and I've got seven labeled examples why don't I just save those examples and run those in my test suite instead of running quick check it'll be much faster will it work just as well well just to illustrate that let me show you the code of insert there's the data type at the top the rest of the code is the function definition if you read Haskell you can see that it's just you know it's the usual binary insertion this is the correct version of the code let me show you a buggy version you saw the bug hat suppose well correct version buggy version correct okay no where's the bug it's in the last line when we're inserting a key that is already in the map and we found the milk containing it and the bug is actually here this is the value that we put back into the resulting map and I've written B prime there which is the value taken from the previous version of the map it should of course be the value of I'm inserting so this is this is an easy title to make just an extra prime but the effect is that when you try to use insert to update an existing key it's an oil doesn't this code is obviously buggy but it will pass all seven of those save unit tests why well let's look at one of them here's the example that saved for doing an update in the middle of the sequence look I'm inserting zero and the zero key is already present but the values in both cases are zero so I'm testing update by replacing a zero by a zero of course that can't detect this fairly plausible bug and why why do we have zero in both those places that's Satan if I run quick check of course it's going to generate different values and very very quickly after 14 tests it reveals the bug in this update function what's the moral of the story the moral of the story is don't let Satan write your unit tests ok so I've got quite a lot about tempt labeled examples so labeled examples like to see what our labels me it lets us debug the label that's really important I haven't talked so much about tuning but let me talk about one very common way in which tuning can break we'll go back to the coins example remember I worked really hard on my generator to ensure that boundary cases appear sufficiently frequently so here's what might happen next next month somebody else might work on this code they might add a new operation maybe multiplication of a coin by an integer and then they'd write their own property and they'll start optimizing their distribution by changing the same general are they looking at my distributions while they're doing that of course not so if all that happens with these distributions is that I I ball them and decide they're okay then somebody else can screw up my testing totally later on and nobody will know what can we do about that here is the code that I used for collecting those labels so I've got this summarization function up here I'm going to show you a different way of collecting the same information so to do that a little bit more space I'll take away the labeling code that I had before and I'll add the labels in a different way so what classified does is it labels a test case with a string if a given condition is satisfied and if you look at the conditions in those three chords and classify they're exactly the same as the conditions in the definition of my summarization function likewise the strings are the same so this code is doing almost exactly the same thing as the summarization killed I have before okay if it's doing the same thing what's the point of changing you well this code tells quickcheck about each label separately and that means I can do something else I can replace this classification function that just adds a label with a coverage reply and these coverage requirements say they classify in the same way but in addition the boundary case must occur 5% of the time the normal case must occur 40% of the time and the overflow cases occur if I do that I put my coverage requirements into the property where they will be checked every time I run the tests so when I test this property now if the coverage requirements are all satisfied it just behaves as usual but if there aren't enough of a particular time I get an error message there we always see this time I got only 4% boundary cases I expected 5% but if you look at it you'll see quick check still says okay hundred tests so this is not a test failure you can put this test into a test suite and it's not going to cause other people's tests the pale but you can see if your requirement is not met now you're probably thinking but what good is that there's going to be a message in the test log somewhere but nobody will ever read sure the why isn't this a test failure because this can happen just through bad luck quite often right if you keep running these tests sooner or later you'll run 100 tests you don't happen to get five boundary tests in that run we don't want that to call somebody else's checking to fail and the continuous integration says no no if you want to be sure that the coverage criteria is not met you have to do this you ask quick check to check the coverage criteria and if you do this quick check won't us run 100 tests it will run enough tests to be certain that those coverage criteria are not met so this is a failure if somebody screws up your generator when this test that checks coverage is run it will cause the test suite to fail and as you can see we had to run in this case 50 1000 tests to be able to say we are not getting 5% boundary cases and that is a statistically significant result now what do you do and this happens well you might now change the generator to get boundary cases more often perhaps that the obvious thing to do or you might say 5% was a bit arbitrary anyway maybe 4% is okay so if I change it to 4% then quick check will run even more tests 100,000 and then it says yes we are getting 4% family bases we know that we're confident it's statistically significant now the reason I had to run some early tests is that I chose a requirement that was very close to the actual number we can make life easier to quickly as well if it's enough to have say 1% of boundary checks then since I actually get far more quick check will be very quickly satisfied but you know we're getting enough family first if I say I want 10% boundary test which I'm obviously not that once again quick check will be able to very quickly determine that we're not getting this number of boundary cases and cause of testing so this check coverage clipping this is something that is very new in Haskell picture I think it's really important how can you figure out how many tests you need to run you have to read this paper from 1943 and this paper when it came out it was immediately classified because it was so important for the war effort this is military grade statistics but luckily you don't have to read it because we have and quickcheck knows how to do this but there's still one question we have to ask ourselves whenever you draw a statistical conclusion you need a certain confidence level right so you know psychologists if they can draw a conclusion with 99% confidence that's great but what does 99% confidence mean it means you'll be wrong 1% of the time so this these check coverage tests they may be wrong sometimes so we need to think about how often is it okay for a test in the test suite to pay off when there is no bug while you're thinking about that just want to show you a quote from agile Borat my favorite tweeter I can't do the accent but he says my friend as an app is a very good development he's always have all unit tests green if unit test is fail it is removed is best practice but you know agile Borat is right there's a name for tests that fail occasionally they're called flaky tests so if one of these coverage tests fails the first time maybe somebody will spend a few days investigating and they'll conclude actually was nothing wrong what's going to happen the second time that test base it's just going to be deleted but those tests are important to prevent properties losing their effectiveness so how often is it okay for a test to fail when there is no bug is my claim never in the lifetime of the project so what confidence level do we need one-in-a-million suppose you've got 100 developers both they each run the test suite 10 times a day that's a thousand runs a day supposed to test week contains a hundred coverage properties that's a hundred thousand pass today if we use one in a million our confidence level this bad thing is going to happen every 10 days that's not acceptable so quick check uses 10 to the minus 9 we're more like particle physicists here and maybe that's enough maybe 10 to the minus 12 might be better I'm a little open about that of model but nevertheless lift that fifth thing you can change it but 10 to the minus 9 is a good default if the project is not too large or too long so I'm teaching a method I'm teaching you when you want to test something think about the unit tests that you would write generalize them to one property that rules them all but label the tests so that for every unit test idea you have a label that tells you how often that occurs use labeled examples to debug your labeling you will get a shock the first time you see the test that Satan can produce I guarantee you once your labeling is correct gather statistics tune the distribution once you've got the distribution you want write those coverage requirements into your properties and make sure nobody can break them without finding out if you do this you will use your intuitions to build truly effective property based tests thank you you you
Info
Channel: Code Sync
Views: 5,740
Rating: 4.9679999 out of 5
Keywords: lambdadays, john hughes
Id: NcJOiQlzlXQ
Channel Id: undefined
Length: 48min 16sec (2896 seconds)
Published: Mon Mar 04 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.