The Science of Unit Tests - Dave Steffen - CppCon 2020

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right thank you for coming uh welcome to scientific unit testing i'm dave steffen uh tech lead at scitec we are a small defense and aerospace contractor i work out of the boulder uh colorado office which is about an hour's drive north and west of where the conference was last year and where we hope it will be again next year the phd is in physics which is relevant uh because what i'd like to talk to you about is the close relationship between unit testing and experimental science and that's kind of an odd topic uh so we'll start out with some ideas that'll lead me down this rabbit hole um so i'll try and take questions as we go through but we probably won't have a lot of time um so i'll try and take questions at the end and i think uh we put up in the chat where we'll be afterwards uh so people can come find me and ask questions all right so um right first of all i'm a physicist uh i have to quote newton i'm not sure i've been standing on the shoulders of giants for this sitting on their shoulders maybe uh lying in a hammock that they're carrying around uh but my job is much easier because you've already heard a whole lot about unit tests this week and you've got a lot of good advice okay and if you look through previous cpp contacts and other conference talks there's a whole lot of good advice out there earlier this week you got test driven development from phil nash claire talked about acceptance testing for gui uh ben sacks did a a kind of getting started beginners uh uh back to basics talk yesterday um also let me call out because i'll refer back to this one of my favorite talks titus winters and hiram wright uh all your tests are terrible this is cppcon five years ago uh it's one of my favorites i make all my uh my my team members watch that and also i liked fedor pika's uh his back to basics talk last year very information dense if you're new to unit testing bite that one off in small pieces okay uh and also uh kevin henny uh gave these two talks which were my introduction to uh behavior driven testing i was late to that party um so these are a couple of talks and you can look all of these talks are up on youtube except the ones from cppcon which will be up on youtube in a month or so um and there's all kinds of other places there's talk after talk after talk up on youtube about unit testing blog posts books there's all kinds of great stuff out there so uh just kind of get started there are a lot of properties of unit tests different people have different lists this list was more or less uh i think proposed by uh titus in hyrum five years ago ben talked about this uh quite a bit yesterday um what you'd like in your unit test correctness and completeness readability unit tests typically don't get tested in and of themselves so how do you know they're right you declare them right by inspection which means they have to be inspectable demonstrability you want them to be able to demonstrate the proper use of your code resilience you don't want them to break under maintenance all these good things okay um [Music] but if you go look anywhere else there's all kinds of other good advice you want your tests to be easy to run you want them to be faster run so you don't slow down your teammates in your development cycle you should use test driven development go see phil nash's talk earlier this week and he's given a bunch of other talks on the subject you want your test to be deterministic okay you may have code coverage or regulatory requirements depending on what your industry is okay um and uh there's a couple of these that i want to drill into just kind of see if we can find what's sitting behind a lot of this good advice but before we do anything else the most important thing is rule zero existence your tests have to exist all right all of this advice that you've been hearing can be very intimidating especially if you haven't done unit testing before uh and you can never get all of it okay you can never meet all of these things you can probably mostly meet most of them but even if you can't meet very many of them write tests anyway okay bad tests are almost always a thousand percent better than no test at all i've seen negative value tests once in an extremely dysfunctional environment okay so write your tests all right um let's jump right in uh the most common advice for testing object-oriented code is just testing a class uh is to do uh so-called uh black box testing where you use only the public interface black box uh because you're not looking in to see how all the gears and the wires are laid out all right you're not looking at the implementation you're just looking at the external behavior of the thing and there's all kinds of reasons for this there's all kinds of talks and discussion about why you should do this it forces better design because you wouldn't have designed it that way if you've known you had to test it uh you avoid tight coupling to your implementation so you can you can refactor your code but you don't have to change your unit tests so fine so this is the good advice let's actually try and go do this and see how much trouble we can get into now this is an example uh i didn't so much borrow this from kevlin's talk uh this is kind of wholesale theft uh kevin i think i owe you lunch um so we have a binary cup class it's either empty or full the default constructor makes an empty cup there's an is empty member function so you can decide you can tell if it's after you're full uh filling an empty cup folds it drinking from a full cup uh empties it and you've already written the implementation for this in your head everyone knows exactly what's inside this class right it's trivial you couldn't think of a simpler class to try to test all right so let's actually go try and do black box testing on this like everyone tells us to now the old approach which i used to do before seeing kevin's talks was that for every member function i would have a test case right or maybe a couple if you want to separate out the error cases or you know something like that but generally you've got a default constructor so you have a test case for the default constructor you've got an empty member function you've got a test case for this empty all right now this is actually not a good idea and we'll find out why but this is the way i used to do things okay but let's actually go try and do this all right here we go so here's our test case for the default constructor what do we do we make a cup with the default constructor and then we use the is empty member function to see if it's empty great it passes we're done now we go on to test the is empty member function we make a cup which is supposed to be empty and then we call is empty and we say wait a minute there's something wrong here we just wrote the same code twice why did we write the same code twice and the answer is that you need each member function to test the other one okay you can't test either of these member functions in the absence of the other one because all you can do is use the public interface all right i call this the black box conundrum uh fundamentally if you test via the public interface you have a circular logic or at least a circular trust problem right in order to test this part of the interface you have to use this part of the interface you're basically assuming it already works at some point you'd like to trust to test this part of the interface but you probably use this part of the interface to test that so at no point do you ever really have an independent verification that any given part of this thing works okay for all you know there's just a bunch of bugs in it that happen to hide all the other bones right and if you've been around for long enough you've seen this kind of thing all right now there are many common solutions this isn't a new problem people have known about this for a long time and one of the most common is that people just ignore it and the joke is that actually that's not such a bad idea um you're supposed to test this way people go test this way they get on with their lives fine now there's another way that people try and handle this which is where you declare one member function correct by inspection and then you start from there and then you test everything in terms of that and then you build up a non-cyclic trust graph or something um for example is that the member function you know it's just a one-liner it's a simple getter that can't possibly go wrong so you look at it and fine at least philosophically there's a problem here because you have now introduced a manual step into what's supposed to be an automated system but okay um and a lot of people do this honestly they they look at this and they say well this is dumb right look i know perfectly well what i'm supposed to be testing okay i know perfectly well what i'm supposed to be testing i should just be able to reach in here and check the internal state to make sure that the internal state is what it's supposed to be then all these problems go away okay this is basically white box testing we somehow open up the class it should be clear box because you can see inside it but okay so you're looking inside the class at the implementation and all where you know the gears and the wheels and the belts and the wires to see what it's doing and making sure that that's correct okay um now you're not supposed to do this you're supposed to do black box testing but nevertheless a lot of people like to go do this now what we've got on screen here of course won't compile because m is empty is a private member function but that's no problem right everyone just does this right you find private public and this compiles and you can write all your tests please don't all right like really please don't do this this is formally undefined behavior there's actually a statement i think there's a sentence in the standard that says this is undefined behavior if your testing scheme involves invoking undefined behavior on the first line i think we have problems okay um but i do have to say about this i hate to say it that it works uh at least on a lot of compilers gcc i'm pretty sure it does i think clang for obscure reasons but this actually does work reliably and it's cheap right you can get right in write your test you don't have to change the original code you can just get on with your life but don't do this if you have to do white box testing and there are times when this is your only good option remember rule zero you have to write tests okay you break and caption encapsulation using a friend class so over here i don't know if you can actually see my cursor we had a friend declaration and then what we do is you define the friend the cup tester whatever it is you define that in your test suite and then because it's a friend you can get in and fiddle around with the guts of your cup class now you can do white box testing this is better than the other way because there's no undefined behavior i suppose you have to change your source code you had to add that friend class or friends struct but it's probably not important uh i don't think that changes things in any way that matters people debate this but never nevertheless i mean however you do it all right this is the better way to do white box testing but we'd still prefer not to do this but again realistically there are times when this is your only good option um example you got legacy code and the code is written in such a way that you can't write unit tests for it you'd like to refactor it so you can write unit tests but you can't refactor it until you've got unit tests so how do you break the gordian knot when you start with white box testing you get some tests around it then you can refactor it hopefully you can fix it later but you don't want to do this if you don't have to and it seems odd that we are incapable of actually following everyone's good advice to say not do not to do this what are we doing wrong okay why didn't this work when we tried to do it before now the answer is this thing called behavior driven development okay the idea is that we don't do this we don't have one test per member function all right what we do is we test the class as a whole and we have a unit test for each behavior so the behavior of the whole cup is that a new cup is empty and an empty cup can be filled and filling an empty cup makes it full and drinking from a full cup makes it empty and etcetera etcetera etcetera right these are what you're trying to test okay and if nothing else you'll notice that the names are now much much better because your unit test names now read like the design spec and i've seen that be very very useful in real situations all right so let's actually go do that and see how much trouble we can get into so here's our test case for a new cup is empty great we make a cup with the default constructor and we require it's empty okay that's what we're supposed to do great and we go on to an empty cup can be filled and now let's you say wait a minute back up didn't we just do this i think we had this exact code two or three slides back we actually wrote it twice and we didn't like it either time back then why is it okay now right all we've done is change the name all right why does behavior-driven testing solve this black box conundrum all right don't we still have that circular truss problem the answer is that we are testing the behavior not the implementation okay we're testing the consistency of the interface not the correctness of the implementation and i had to repeat this to myself for a day or two before it really sank in okay we're testing only the behavior that is visible from outside so look if the constructor is wrong because it gets the logic backwards it gets the sense of the internal movie and backwards right makes a full cup but his empty is also wrong and flips the logic and phil flips the logic and drake flips the logic and every other member function flips the logic in such a way that from the outside it actually looks like it does what it's supposed to do it's correct right even if internally it's just a flaming pile of bugs and it's incomprehensible and whatever you have to declare it correct like on what basis are you going to say it's wrong okay black box testing isn't asking whether you like the implementation that's what your code review process is for testing is just to see if it's correct and if everything it does is correct i think you have to declare it correct okay a bug that cannot under any conditions be observed isn't really a bug okay and now at this point the physicist gets interested and this is where the hole gets deeper and you've got like red pillar blue pill things going on okay if a bug that cannot be observed under any conditions is not a bug a physicist will remember that about a hundred years ago a little bit more we had a problem in physics you'll remember there was this guy named maxwell and he comes up with the equations that unify electricity and magnetism right and out of that comes the fact that electromagnetic radiation is a wave and that's light hey this is fantastic we now understand all these things but there's a problem all right waves in water have a speed with respect to the water sound in air has a resp speed with respect to the air but light goes through vacuum so what does it have a speed with respect to nothing that doesn't make any sense so at the time this is late 1800s earlier very early 1900s they came up with this new idea that there was some stuff out there that we just never noticed before they called it the luminiferous ether all right and this is some stuff that pervaded all the space and this is the stuff through which light propagated and they said okay we've got this theory that says there should be this stuff let's go measure its properties and every time they tried they failed and got zero right famous experiment tried to measure our velocity through the luminaires ether and every time they measure it this is the michaelson morley experiment i think they got the nobel prize for this they got a velocity of exactly zero despite the fact we know that our planet's spinning right what's up with this well the answer was that the luminiferous ether if you're moving through it does things it it shrinks your meter sticks and it changes angles in your experiment in exactly the right way such that you can never measure it and this kind of stumped everybody until this fella named einstein comes along and says wait a minute a physical phenomenon that cannot under any circumstances be measured doesn't exist unit tests the bugs that cannot be measured don't it's right it's the same argument right if you can't measure it whether it's there or not shouldn't matter all right so einstein throws that whole idea out comes up with the special theory of relativity stands science on its head and gets a nobel prize for something else which was actually getting quantum mechanics started which turned the rest of physics on ted okay because he was that kind of guy all right science has been here now aside from the revolution in physics everybody had to go back to the drawing board and revisit the underpinnings of physics what is the philosophical basis by which we decide that we know things we know what's true and out of that comes a stronger foundation for modern science and a lot of that rests on something called popper's falsifiability criteria what science would like to do is make statements that are true that can be proven true or be proven false then go prove them true or false and you're dumb but in general you actually can't do that think newton's theory of gravitation which says every object in the universe attracts every other object in the universe with a force proportional to a bunch of map and we're going to claim this is true of every object in the universe every object in the universe you can't go check every object in the universe right there's some object 44 billion light years away at the end at the edge of the observable universe we can't go look at it to see whether it obeys these rules okay it just can't be done so what science realized is that we have to do the next best thing all right we can't prove things true just in general very rare cases but you can do the next best thing which to make statements that can be proven false you try to prove them false and fail and the harder you try to prove them false and fail the more you think that they're true in other words your confidence that the statement is true tracks the thoroughness of the tests which didn't prove it false all right now if we apply this to unit tests and go back to where we started just a little while ago we were saying that if a class exhibits correct behavior in every circumstance you have to declare its implementation to be correct even if the innards are a flaming pile of self-contradictory bugs all right what we're doing is we're making a falsifiable hypothesis which is this code has no bug then we write tests to observe the bug we fail to observe the bug there's no bug and the confidence in the correctness of your code tracks the completeness of the testing of your code all right now what we just did is took something out of the philosophy of science and applied it to unit testing and learned something now if these are similar do we learn something by turning it around and seeing if the metaphor works the other way and actually we do if you think about what empirical science has been doing we are trying to reverse engineer the source code of the universe by writing unit tests against its observable behaviors right this is what an experiment is you observe something in the universe and then you try and figure out what's going on and scientists would love to white box test reality okay we'd love to know what's going on quantum mechanics is so bizarre the more you know about it the less you think you understand any of it okay we'd love to white box test reality but we can't because no one's figured out how to define private public before we include reality.h right we can't do this all right so what we do instead and have been really doing for the past 350 years or so is developing experience with the logic procedures and epistemology that is the philosophy of what do we know and we don't know and how we know it of black box testing that's what empirical science really is so the first result is that behavior driven black box testing is on sound philosophical foundations thank goodness for that i'm sure many of you out there were losing sleep at night worrying about this now we can actually get some sleep you guys won't break no one else worries about this but look can we get any practical results out of all this for example if you start to think about it your software is a system it's a physical system right i mean there's electrons getting pushed around at some level it's a physical system that you want to poke prod and study for the existence of bugs and your unit tests are the experimental apparatus to detect the bugs all right that's really what we're trying to do if this is the case maybe experimental science can give us a few pointers all right so here's the simplest example i could think of how do you measure something on a scale um alton we're fans of alton brown his formula for coffee is 28.5 grams of coffee to 400 grams of water my wife and i do 30 grams because we live on the edge do you think this is 30 grams of coffee are you sure do you think maybe this is a trick question because i'm asking at the cppcon conference you should at least be thinking about what's whether or not pardon me whether or not the mass of the measuring cup is included in that 30 grams right in fact this measurement is completely meaningless i don't know if you can see it on your screen but that button right there is labeled zero all good scales have a way to zero them at some condition and then they measure relative to that if you don't know where this was zeroed you have no idea what this means okay and hint this is not 30 grams of coffee all right what do you do all right you zero the scale out but you do it under exactly the same conditions right there's a sample and you want to measure the mass of the sample so you set everything up exactly the way it's going to be but without the sample you zero it there then when you add the sample you have the reason to think that that 30 grams really means you've got 30 grams of stuff and by the way that really is 30 grams of coffee um i hope you've gotten enough coffee today i can't give you real coffee because it's a virtual conference but you know you got a picture of it and i've got news for you this is test driven development what do you do in test driven development okay you write a failing unit test to demonstrate the bug you fix the bug you rerun the unit test and watch them turn green right the sample you're measuring is your bug fix okay so first you measure the system without your sample without the sample which is without the bug fix so the test fails you introduce your bug fix all right which is introducing the sample and then you measure again and you measure the results of adding the sample in the presence of the bug fix the bug is not detected all right test driven development is just good lab technique it's not just all right there's no just there it's good lab technique exclamation point and honestly test driven development is a whole lot of other things go see phil nash's talk earlier this week or at previous conferences my favorite statement about test driven development came from fedor picos and his back to basics class last year um it's not so much about what it does to your code it's about what it does to your mind now this is a talk about science and i think fedor is talking religion but he's not wrong i love it this is absolutely true okay but this aspect of test driven development is basically just the discipline to make sure you're doing good lab technique now by the way something else you do if you need to measure things is you calibrate it you put a known weight on the scale and these are typically expensive because you have to trust them and then you make sure that your scale reads what it's supposed to read is there an equivalent in unit testing yes and actually ben talked about this uh quite a bit yesterday once you've got everything working you should go break your code and make sure your units has failed the right way all right introduce known bugs and watch your unit test tell you about them um most people don't do this much but i found this every now and then useful particularly for beginners it can be very confusing if you're writing unit tests to test failure modes right because the test succeeds if the code fails but only if it fails the right way and the test fails if the code succeeds what what well sometimes like if you've got a bunch of different exceptions that could be thrown it's really useful to go in and change the type of an exception watch your unit testing framework catch the fact that you just threw the wrong exception that kind of thing is basically just calibrating your scale okay let's go on to another example um this is a little bit more pernicious uh what is the definition of write okay so we've got some floating point code all right this is not interesting floating point code and it's a stupid example because it's a slide and i don't want to have to explain complicated floating point we're trying to compute some number which happens to be pi we compute it by taking the arc cosine of negative one notice that there's an f at the so up there somewhere uh a cos f right that's single precision floating point okay so the answer is only correct to seven decimal places about what you expect right 3.141592 is correct that seven is wrong but if it's some real computation you don't know what the answer is this is probably good enough well maybe it's good enough all right if you were trying to measure this or compute the uh equatorial circumference of the earth based on this you're going to be off by at most a meter and i've got news for you but the earth's radius is only known about seven decimal places it doesn't have any more accurate one because we've got solid ties never mind the tides in the water the continents flex and at the equator the continents the rocks go up and down by about half a meter anyway this is fine all right your floating point computation works it's good enough for your answers your system is fine everybody's happy until someone comes along and takes that f off they improve your floating point computation now the result of this is good all the way out to 16 decimal places or whatever your machine supports but your unit test just broke because you've got extra digits there that are now wrong but you didn't need those digits right those were kind of garbage right they didn't matter because the rest of your computation was only good to seven decimal places anyway the point is that an incorrect right answer creates a brittle test breaks under maintenance because you've unnecessarily pinned your answer to a specific algorithm as opposed to the general thing you're trying to do this is just bad error propagation which is the bane of all freshman physics labs and i've taught many of physics lab nobody ever gets it right okay you've got your measurements the input okay and there's only so many important figures in there that actually mean anything anyway then as you go through your floating point computation you're gonna lose some of those significant figures due to round off and at the end you're gonna get something and it's right well we hope it's right but it's either right based on what you put in and you can say this is what we should get or it's right enough for what you're trying to do okay so the fact is oh by the way what should be done here is that you use something in your unit test framework to uh to do correct floating point computations all you good unit test frameworks have one um developing mathematical algorithms that's hard right characterizing the numerical stability and the round off error is even harder but it's actually part of the job right you don't really understand it you shouldn't rely on it you probably can't code it right and you might not be able to test it right unless you've actually done all that work all right you have to figure out what the error bars on the end are and then put those into your unit tests and this isn't just floating point okay um five years ago that talk all your tests are terrible titus and hiram make a lot of fun of someone who is testing a jpeg compression they take an image they jpeg compress it they get a result they put that into the unit test they run the unit test and they get the right answer hurrah it works no it doesn't because someone goes and approves the jpeg compression algorithm same thing happens the test fails although something else got better okay so even if you're not doing some fancy floating point computation you still have this problem what is the right answer i don't know how you decide whether a compression algorithm does a good job on on a picture is it good enough that the human eye can't detect it how sensitive is the human eye to green light i don't know how to do that that's hard but if you're writing jacob jpeg compression algorithms you probably need to think about it okay let me talk for just a second and check questions let's see i will just mention i'm talking about unit test yeah everything we're talking about is just testing in general unit testing was just kind of the was the hook maybe or it was it was the easy example to get started in because that's what everyone's worrying about right now okay let's get more general all right now we can't go into like the full theory of experimental setup right that's a big topic and by the way my phd was in theoretical physics right no one wanted me in their lab i walk in and things blow up right i stay away from the experiments um but roughly speaking for our purposes we can say that three things we want out of a good experiment is it's precise it's reproducible and it's accurate so let's talk about these all right now what precision means in this context is that any given measurement gives you a lot of data all right you get a lot of information from doing whatever it is you're doing so the most obvious thing here is just use a unit test framework that gives you good messages and make sure you're using it to give you good messages so that if something fails that is if you detect a bug right it tells you here's the file here's the test case here's the line number here's the assertion that failed we expected 49 we got 42 whatever right you want a lot of information that points you directly towards what the problem is so you don't have to spend a lot of time understanding that you can get straight to understanding the real bug all right that's precision that's a precise experimental setup but here's another maybe more pernicious you will see this every now and then um you've got a test case for a widget i don't know what a widget is it's anything and as a part of that we're going to put some widgets in a vector so we make a standard vector of widgets and then we require that it's empty we test that right and then we push back in a widget and we make sure that it's got size one really now you'll actually see this out in the big wide world for all kinds of reasons the reason why this is bad isn't because it's silly to test standard vector someone's got to do it right you hope that the people who wrote it but change the example a little bit so that instead of standard vector it's some other thing it's some dependency that maybe it's developed by the other team down the hall who are a bunch of knuckleheads and they never write good unit tests right and so you want to make sure their stuff works so that you can make sure that your stuff works so you're going to write a test for it fine don't put it here the reason you don't want to test their code in your widget test case is that if they break something the widget test case turns red which means that at two in the morning when there's a production break or something goes wrong they're going to call you because the test report says something broke in widget test this if you want but put it somewhere else where it says we're testing standard vector so if it breaks they go they go bug the right people okay this gives you an imprecise result because it's pointing you at the wrong place don't put red herrings in your path okay um reproducibility this is a little bit more complicated all right most scientific experiments out in the real world have measurements or noise or whatever you do the same experiment multiple times you don't get exactly the same answer now most of the time that's just interference from the environment it's noise there's static on the line or someone bumps the table or whatever it is all right but sometimes it's because you're studying a complex system and you can't control for everything think um uh uh testing uh medical device or or uh or a new drug or a new therapy right you can't just do it on one person because biology is really complex you need to do it across a bunch of people to see whether the thing works or not all right or think about any psychological experiment where you're trying to tease out underlying things across human brains right we don't even know how many yeah we can't we can't even hope to characterize that right now um so it may be that you've got so many variables you can't eliminate them and so you have to put up with getting some randomness or something or maybe it's built into the system quantum mechanics is going to give you random numbers that are random answers that's why it's so weird okay so science has developed ways to handle this and the four obvious ones are that you either do a better job of isolating yourself from the real world you measure the signal you don't want and subtract it think noise canceling headsets right there's a signal you want there's a signal you want you have a separate thing to measure the signal you don't want and you subtract it if you can't do those maybe you can at least detect that there's a problem and eliminate the erroneous results mark those as hey that doesn't make any sense ignore this or the last case everyone's last ditch effort is statistics now this is about handling interactions between your test and the environment okay i'm not talking about code that's non-deterministic for good reasons if your code depends on a random number generator well you're going to get different answers every time you run it okay so what do you do about that well you see the random number generator to the same number every time before you run your unit tests or maybe you have to mock your random number generator so that it's deterministic and you don't have this problem but sometimes you can't get rid of it like what if your code is a random number generator or what if it's a driver for some piece of hardware that's an entropy source to generate random numbers right well in that case what you have to do this gets back to what we just talked about the correct definition of your answer isn't a given sequence of numbers because you can't rely on that your unit test has to do the statistics to make sure your random numbers are right but that's kind of a separate case what i want to talk about here is interference from the environment okay test should fail because the code under uh under test fails and for no other reason that's the ideal but we all have experience with unreliable or flaky tests they depend on timing or some external state there's a test server that's down there's a file system that's full something like that that leads to erroneous test failures and the first thing to do if you're tackling this is to make damn sure that the non-determinism really is external that it isn't your code broken all right we're assuming that as we go forward right you have to absolutely prove that these are false alarms from the external world so the first thing you do is isolation right that's easy unless your code is code that's explicitly supposed to go out and touch the real world connecting to a test server connecting to a database right into a file system all right what do you do if the code has to go uh interact with the external world um well you can mock the external thing so you're not really touching the external thing you're touching an internal thing that looks the same right um maybe you fork your process and create the external thing in in the other fork process but maybe you've got more control over it maybe that could work you know you fork a process that fires up the server and you communicate with it more likely you've got dedicated test servers or dedicated test databases or test dedicated file systems that are there just for the purposes of your testing that you can rely on okay if you're running on hardware hopefully you're running your tests on your hardware you might have to have a bunch of dedicated hardware instances for your ci pipeline to hit as it's running your unit test all right this can be expensive it's expensive in science too right people build physics labs 600 meters down in old disused salt mines to shield it from cosmic rays okay it's pricey but the alternative is that you've got tests you can't rely on so that's one option now if you can't do that maybe you can subtract out the problem all right i don't know that this is generally useful but at least in principle if you're supposed to connect say to a to a server that's and your unit test depends on this happening within a certain period of time you build some external sensor and you verify that independently all right and it's there just to measure the latency to the server and whatever it measures it cues your unit test framework or your test framework in general to say hey the server's slow today add two seconds onto everything so that your tests can adapt to the external situation and still give you answers that mean something might be useful might not be but it's an option okay now the other option if you can't do any of those is at least you can detect the problem right you've got your external sensor that detects hey the server's down the database isn't there the file system is full whatever and you use that to cue your unit tests that that test can't be run or the result you just got is invalid okay now you can do this but you also have to have the tools and the processes in your team to deal with what happens when it says hey i can't run the unit test okay if the environment invalidates the test hey the test server is down all these tests are going to fail but it's not the code's fault right what do you do maybe you mark them as pending and you run them later you hope that you the test server comes back up okay or you mark them as not run and then you have to decide in your team what to do about it right can we merge this pull request if it's got something marked as hey we couldn't run the test we're going to do it tomorrow if you can't do that you're going to have a whole bunch of pull requests piling up in your team waiting for test resources at which point haha it's always good to go to management with real numbers and you can say hey we need more resources for our test environment and i can show you exactly how much time it's costing all right finally if all else fails and your back's really up against the wall you can do statistics all right you can collect some samples and you might know hey this test fails one out of five times so you rig your test framework to say well it just failed let's run it four more times and see what happens i'm roughly speaking more than four right but you rig your test to know that hey it fails every thursday night okay so if it runs on a thursday night i'm gonna rerun it later or something um it's kind of a last-ditch effort but it does give you one very useful thing you're automating the ignore that flaky test failure because that way your engineers don't get desensitized to it right if you've got a test that just fails every now and then everyone gets used to looking over oh it's that test that just failed i don't care at some point it's going to be some other test but they're going to ignore it because it looks the same right don't desensitize your engineers to flaky tests if possible rig your testing system to handle that okay now this is the last thing we're going to talk about this is accuracy which is a bit more complex and what accuracy means is that the results of your experiment match reality so we have a truth table everyone likes truth tables right we've got two binary options your code is correct or it isn't your test results pass or that it's over here test results pass or they fail okay so the first thing is um we're thinking of our labs as test equipment to detect something the thing we're trying to detect is bugs but we're making a falsifiable hypothesis which is that our code is correct and we're trying to detect a signal that says otherwise right so positive in this context means you found a bug and negative means there's no bugs now that's the emotional opposite of what you want right no bugs is positive makes you happy you can go home you find a bug you're unhappy you have to stick around and it's late but you got to fix the bug all right so this isn't the emotional response okay this is is there a signal positive means there's a bug negative means there isn't a bug and in this table i've labeled two things in green and two things in red now the green ones are the high accuracy results your code is correct and your test results pass your code is incorrect and your test results fail those are the cases where your tests in reality match and those are green and the other two cases are red because that's what we don't want okay this isn't the same colors that your ci pipeline spits out if it finds a failing test those are the numbers over on the left okay so this is a little bit more complex let's go through each one of these in turn so upper left-hand corner um high accuracy the position we all want to be in your code is correct in your test pass hurrah success okay why have you gotten here because your tests are complete they're correct you've got the correct definition of the correct result all these good things that everyone's telling you to do you've done and your code is correct and your test results tell you that your code is correct this is a success ship it fantastic now let's go down to the false positive case all right false positive means your code is correct but your tests fail how does this happen brittle test someone fixed something and broke said this is the example before of the correct definition of the correct answer all right or you're depending on on non-guaranteed behavior depending on the order in which you iterate through something in a hash structure or something or it got broken under maintenance issue maintenance or you had insufficient test reviews or something the point is the code is correct the test is wrong this is a false alarm and the cost of this is wasted time because you have to go fix something although your code's correct and you could ship it so that's a false alarm okay now to contrast that let's go upper right this is the other low accuracy poor accuracy case which is your code isn't correct but your test pass what's this well your testing was incomplete some bugs were hiding someplace but you didn't shine the flashlight over there or you're testing in an unrealistic situation there's a whole area of the room that you didn't look in or you've got an overly generous definition right you opened up your error bars too much and you're looking right at a bug but you can't detect it all right you can't recognize it this is an undetected bug the risk here is that you ship something that isn't going to behave properly now just briefly let me point out that people are very rarely over in the upper left-hand corner where everyone wants to be i mean maybe you're there but at some point or another you're probably going to be in one of these other cases where you've either got a false alarm or an undetected bug fixing that's hard but at least you really ought to know which one of those is important for your industry okay if you're building a video game right shipping on time is important because you've had months of advertising and it cost you a lot of money if you don't ship if there's a bug on level 57 no one's gonna get there for weeks anyway you can patch it right no big deal so you'd much rather ship with bugs than miss a ship date all right on the other hand if you're building flight avionics or a medical device or something you're happy to miss a ship date well i don't know about happy but you're happier to miss a ship date than you are to ship something with a bug in it and if you start reading about this reading gets very grim all right uh go back and look at the patriot missile system in the second gulf war partly due to software errors we had two friendly fire incidents all right or if you really want something grim go look up the uh yeah it's the therak 25 computer controlled radiation therapy machine it's a cancer treatment machine and on several cases because of software issues it delivered massive overdoses of radiation cancer patients killing three people all right this is grim but those of us who work and that this absolutely has to work or someone dies you don't want to be in this situation now you may not be able to avoid having accuracy problems in fact everyone's going to hem at some point but knowing which position you'd rather be in tells you where you can spend those precious and scarce engineering resources to fix things okay it's very important to know which one you'd rather be in because you can spend money and time making sure you're in the one that you can live with all right finally let's go to that one down in in the lower corner there all right now this is in green although you may not be happy about it but this is a success your code is incorrect your tests fail that's a success all right your test just did what you wrote them to do they told you that you have a bug this is success now i'm not saying you're happy about it you may be very unhappy about this because you wanted to go home or you wanted to close the ticket but this is a success it needs to be celebrated as a success by the team it needs to be celebrated as a success by your management all right and if it isn't success treat it as a success by your management you might want to look elsewhere we're hiring okay um and i've been at places where this was a problem and it's like this is not going to lead to anywhere good if you are in some way punished or the results are bad that you found the test that is to say you found the signal from your experimental lab setup that you were trying to find with it now this is really where science gets interesting all right my favorite quote about science actually comes from randall monroe the guy who writes the xkcd comic and if you don't know about that go there because he's the only person who makes funny jokes about computer science but this is what he said you don't use science to show you're right you use science to become right this is a very profound statement and i can't tell you how true it is all right this is how science works and if we swap out science for unit tests we still get a very important statement you don't use unit tests to prove that you're well actually we do okay yeah well yeah we do but that's not really the point all right we use our unit tests to become correct right no one gets to that upper left-hand corner where we can ship it where everything's correct and your tests prove it and all that without having been down here a lot and finding all the bugs we didn't think about because you wrote good unit tests all right so if fedora can get religious about test-driven development i can get religious about this and say that this result is enlightenment you just learned something you needed to learn you're not happy about it but you learned it and that is absolutely crucial if you want to move from this form of accuracy to the other form of accuracy which is where everybody wants to be okay so that's accuracy let's try and wrap this up a little bit unit testing is science again it's a physical system and what you're doing is building a lab set up to measure a specific thing about that system which is do you have bugs not just unit testing all of your tests all right what you're doing is making a falsifiable hypothesis call it c all right that this code is correct you then write unit tests and attempt to show that it's wrong go find the bugs all right your confidence that c is true tracks the thoroughness of the test you've used to try to prove it false this is the scientific method as understood by modern philosophy okay now all of that advice you've been getting from all the other talks all that stuff put up that slide at the beginning with all the stuff on it uh you go listen to the other talks all the blog posts is there someplace kind of underneath that we can look to see where all that's coming from and i think there's two or maybe three places a lot of that advice comes from necessary process test driven development is a necessary process for a lot of reasons other than it makes you zero your scale okay making your test run fast you don't slow down your team's development cycle maybe your industry has regulatory requirements for unit test code coverage and so your company says look you must do it this way okay or it might be maintenance issues this is what uh the guys from google and the guys from bloomberg are usually talking about how do we write these tests so that they're still true and useful in five years and they don't represent an enormous maintenance burden on the development of our code okay that's really important to do all right most of black box testing is really about that okay but everything else i think this is an unproven hypothesis tell me wrong tell me if i'm wrong but everything else is about making good tests all right because we want to prove that our code is true and we know we can't really right i mean proving that your code is correct isn't that the halting problem right which is np hard i think someone with a phd in computer science can correct me on that i don't know that stuff i'm a physicist all right but you can't prove it true right but you can prove it false so what we are doing really is science in every way shape and form and given that we've got centuries of experience doing that whereas we've only got what 20 or 30 years of unit testing out of 20 years i don't know something like that right it isn't surprising that we might want to go look at what science has been up to since the late 1600s to see how they handle these problems okay a lot of this stuff has already been worked out we just have to figure out what it means for unit testing so to summarize what i'd like you to do with all of this is remember that you must test right even if you can't do all this go write your tests you're lost without having good tests go forth write good tests and do good science thank you now i'm looking over here and i've got question after question after question which isn't really surprising give me just a second here uh let's see not unit testing do we have to use the scientific method in testing highly complex systems ai that was learning for 10 years so okay you got an ai that's been learning for 10 years now how do you characterize what's true i think it gets back to that in part how do you know it's giving you good answers well you can argue that a sufficiently advanced ai is maybe more like asking a person than a computer program because i don't know anything about ai but my understanding is a lot of times you've trained to do something but you don't really know how it does what it does you just know it's trained and it gives you good answers well look you've got developers you work with they've been trained for years how do you know they give you good answers right uh you you might be at some point reduced to doing psychology on your ai to make sure that it actually knows what the hell it's talking about um that scares me and there are people at my company that do ai research and i i don't know that's that's a tough question that's a good one all right hang on here let's see uh let's see introducing bug checking units it's base of mutation yes mutation testing is kind of that calibration that i was talking about i had the picture of the the weight on the scale it's that written large and i know nothing about it but if it's something where you go and make sort of random changes to your code and watch it break yeah you are that's exactly an extremely thorough calibration exercise to make sure that your tests can catch all the stuff that they're supposed to catch right that's exactly what that is okay um yeah that's you're exactly right let's see how white box testing work with external dependencies like underlying private classes or external devices oof yeah so the black box testing is hard and it's way you have to mock things because it depends i mean it's complicated all right which there are a lot of other good thoughts about start by the talks previous this week go to some of those talks i put on my first slide and by the way i don't have them yet but when these slides go up i'll have uh more extensive links on this slide after the one that's up now um yeah so there are times when you're up against the wall and none of the good advice works and you have to go do the thing you're not supposed to do white box testing all right i've seen math code that's got a huge chunk of math and we'd love to break it apart and unit test it to make sure this coordinate transform works and this but we can't because breaking it apart makes the code worse and it slows it down because you can't hang on to your intermediate results white box test it's your best option my project right now is an extremely legacy uh uh thing half a million lines um there's 477 in there you know what white box testing is great we have unit tested 0.1 of it and it's totally saved our butts but oh it's ugly tests but hey it's what you have to do um testing hardware drivers for hardware i don't know about that's that's a that's a good one can you mock it can you come up with a piece of hardware that isn't the real hardware but acts like it enough but gives you known results that's a good one that's tough that's a tough one uh let's see black box testing yeah black box testing has that worse white box testing still has it but at least you can go in and rewire something black box testing yeah mocks are your first guess i don't know how much time we've got left um there are some things you can do to your initial design to make it easier to mock or to easier to test and some people will argue oh i don't want to i've got this beautiful design i love my design but it's hard to test but i don't care i shouldn't have to test for design i will tell you i used to be in that way back when we had this beautiful uh uh software um that was very carefully designed it was beautiful and you liked it but you couldn't test it and then we started doing units back when i first started doing unit testing and we resisted we started doing the defined private public all right because like it's the only way we can't get in to look at what we want to get into and it took us a while to realize that every time we just bit the bullet and changed our design to let us test it the design got better in ways that totally surprised us design for testability that's a big design all the other things you're supposed to do do that too designed by testability isn't an afterthought if you can design it up front this makes this at least much easier to deal with but even then there are times okay there are times when you got to pull out the white box or or do something crazy all right it happens when your back's up against the wall all right you got to go do it all right let me see um okay that's all the questions i see right now um i don't know how much time we've got left so we'll call it good here um i'll meet you wherever we said we're gonna meet you in the virtual room and if you're watching this a month from now on youtube uh comments down below tell me i'm wrong tell me i'm right tell me that the halting problem is an np hard whatever you want to tell me all right thank you very much you

Info

Channel: CppCon

Views: 19,579

Rating: undefined out of 5

Keywords: c++ talk, c++ talk video, cpp talk, cpp talk video, c++, cpp, cppcon, c++con, cpp con, c++ con, cppcon 2020, c++ tutorial, c++ workshop, learn cpp, learn c++, programming, coding, software, software development, Unit testing, unit testing c++, testing in c++, unit tests c++, unit testing oop, testing classes, how to test code c++, test driven development c++, behaviour driven testing, dave steffen, dave steffen c++, dave steffen talk, unit testing in c++

Id: FjwayiHNI1w

Channel Id: undefined

Length: 55min 10sec (3310 seconds)

Published: Sun Oct 04 2020