"The Clean Code Talks -- Unit Testing"

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

>> HEVERY: So today, we're going to talk about how to write untestable code because we're also good at it. How do you write hard to test code? >> [INDISTINCT] analytic. >> HEVERY: Some analytic, okay. >> Non-deterministic. >> HEVERY: Non-deterministic, exactly. Okay. That's good. So, it is something interesting, right? We--when asked these questions on an interview, most people have a really hard time answering me what--how exactly they would go about writing hard to test code, even though the code they read is really hard to test. So, we are intrinsically good at this, even though we don't--ourselves are not--we don't know how exactly we're good at these things. It's kind of like a spider weaving a web, you know. It just knows how to weave a web and you can ask him, "How did you do it?" And the spider says, "I don't know. It just kind of works." So, this is what people normally say. Like, make things private, use the final keyword where you have long methods and you're pointed out, doing stuff monolithically that kind of goes with long methods, et cetera. Non-determinism, that's a good one too that I don't have. But here is the thing that--real issues of unit testing. And then it's mixing the new operator with your business logic. I mean, I get to why exactly that's a problem in a second. Looking for things, and we do this on our code all the time, you know. Doing work in a constructor, that makes it so that it's really hard to instantiate things inside of your test. Having a global state, which is essentially where all of the uncertainty comes from. Singletons which is just another name for global state. And static methods, which is essentially procedural programming. And one thing that you can think about is, suppose somebody gives you a purely procedural code, how would you test it? And it turns out that I have no idea how to test purely procedural code. Because in order to test something, I need to isolate something. And in order to isolate something, I need to have some kind of a seam. And seam in object-oriented world is my polymorphism coming to play, something that I don't have in procedural code. Yes? >> So, I've--maybe I don't understand. Why exactly is static methods hard to test? I guess, I'm [INDISTINCT] >> HEVERY: Because you don't have a seam. >> [INDISTINCT] two to three years [INDISTINCT] >> HEVERY: So, here's the kind of the problem. I don't want to get into it too much because there's a couple of slides later that we're going to cover this. But the basic issues is this, if you have a leaf methods such as math that absolute value, piece of cake to test, right? Because it's a leaf, it doesn't call anybody else. But if you have a method that is way up in a call hierarchy and you're trying to invoke that method and you want to prevent that method from calling--I don't know--a database or something like that, there is no way for me to prevent that call from happening because all the methods are static and there's nothing for me to overwrite. So, yes, in a simplest case, when you have a simple leaf method like an absolute value or such a--such a thing, piece of cake. But when it has to be a more complicated program, the answer is no. So, the worst thing is trying to test a main method. You're trying to test your application from the main method? Good luck. Chances are, you cannot do it. So, we don't really how to do it. The other thing is deep inheritance hierarchies because it's essentially the same problem. I cannot divorce myself from the inheritance in a--in a--at runtime, right? At runtime, I would like to build--instantiate a small portion of my application. And if the test I want to--if the class I want to test is X and X inherits from A, B, C, D, E, F, G, then whether I like it or not, I'm testing all the other classes as well. And so inheritance--deep inheritance hierarchies is something that makes it really hard. So, notice this tool list. Most people actually cannot answer the question of what makes code hard to test even though we do this all time. So, we kind of want to talk about this little further. Yeah, and last line I forgot, are good old favor, too many conditionals which is the if statement. But anyways, let's get down this into--further on. This is a really long list. So, here's a thing, what can I tell you about writing tests? It turns out, nothing. Like, I cannot teach you anything. There's no magic to writing tests, absolutely not. It has a couple of framework--I mean, it has a couple tools like easy marking framework and stuff like that. But for the most part, there is no secret knowledge I have about testing, none whatsoever. What I do know a lot about is how to write testable code, and that kind of is the core of the problem, which is most people assume that I'll write code and I'll throw it over the wall and here comes my test engineer and he'll write some test, except at point, it's too late because the code is already written in such a way that the test engineer cannot write a good test. It's too late. And it makes it kind of worse because the place where the mistake is made, which is writing the code, and the person who feels the pain of hard to test code are not the same people. And as a result, it's really hard to kind of affect change in an organization like that. And so, we're real kind of guilty of this. >> Is it done? >> HEVERY: Yes. >> But for unit test, it should be ideally the same people, right? So, the people who will do that... >> HEVERY: Yes. Exactly. So, you'll do the test... >> So, are you talking about other tests as well or...? >> HEVERY: No, we--this is kind of an introduction. So, absolutely, we want to get everybody to unit testing level. And in the unit testing, it has to be the same person. I'm just trying to point out how the other kinds of test don't really work. So, we're going to talk about it as a unit test in a second. So, just hold on one second. So--glad to know that you were ahead of the curve, though. It's good. So, what can I tell you about writing testable code. Well, things that I can--we can talk about what Good OO is and how it helps testing and we're going to dwell into this thing a little bit, and we can also talk about something we call dependency injection. Sometimes I feel like the dependency injection, I'm selling a snake oil because it fixed so many things. But it does actually work. Of course, there's the Test Driven Development, which as you point out, and we want to take the unit testing folks together. You want to make sure that the person who writes the test and the person who writes the code is going to be the same person, and you want to go definitely into the unit testing route, which we'll get to in a second. So, here's the thing. There is absolutely no secret to writing tests, none whatsoever. The only secrets there are as to writing testable code, and that's kind of what we want to talk about. And it sounds like you're already ahead of the game and you already know that the answer is unit testing; but for most people, aren't--they aren't actually that far along, and they always--that most people are still stuck on the premise that there is just some secret sauce to testing, which there isn't. So, how do we--I like to think about unit testing. Imagine you want to test a car and somebody says to you, "Please test the car for me and to make sure the car works." Everybody who's new to testing will immediately says, "I know. I'm going to build a framework in a context of a car. I'm going to build something were the car can sit on top of and I'm going to build some machine that will tend to be a driver and will turn the steering wheel and push the brake and the gas and play with the knobs, and that's how I'm going to test that the car works." This is what basically a scenario test is. And a test like that is actually pretty cool because it does actually prove that the car works. The problem with it is the execution, and that is, these tests are horribly slow because, let's say you want to prove that the car--all-wheel-drive system works correctly. Well now, you have to get the car into where there is ice. So now, what are you going to do, drive it into a refrigerator? And then, you want to make sure that the car can--it doesn't overheat at, you know, really hot temperatures in a desert. Now, what are you going to do, drive it in the oven, right? Even if you can do that, even if you can build all this frameworks, these things are going to be really, really slow and they're going to be flaky. There's so many things that can possibly go wrong. You're testing the whole system end to end. Like, maybe the oven's broken, maybe it's not the fault of the car. So, the problem with scenario or large scale test is that they're flaky. They're slow and then flaky. And it's not uncommon for you to have to take several hours to execute all of your scenario based test, not very useful. So then, you say, "Well, maybe we can do something better." So, we kind of mentioned this that you discover basically when your tests are slow and you discover that tests are flaky, right? We kind of covered this. So, this is the kind of the first stage of unit testing. The people who have discovered, "Hey, test, good." So, we can automate this thing. Here's the good thing. You have that really high confidence when things work that the thing actually worked, that the car actually worked, right? Whereas, if something goes wrong, you're not really sure if the problem with the car or is it because the designer has moved a knob, you know--on inch to the left, and all of a sudden, the framework can't grab anything. You're just really not sure what exactly went wrong. And a lot of them--things are just flaky, and you don't really know why because there's so many variables coming to play. So, it's really hard to reproduce failures. So, suppose if it's flaky and it fails and you're like, "Okay, let's do it again." And all a sudden works this time and you really have no idea how and why and so on. So, I'm just pointing out how troublesome that is. So then, you kind of think about it and you say, "Well, maybe instead of testing the whole car, I can break the car down into parts and test them individually. So, maybe instead of pretending to be a driver where I turn on and off the radio, maybe I can take the radio out in isolation and hook it up to maybe in a solar scope for the output of the--of the radio and remove the knobs. And instead of the knobs, put some kind of analog-to-digital converter that directly controls the knobs. And now, I can test the radio in isolation independent of the car and I can bake it, cook it and do all kinds of things where the--to make sure that the radio works just as we planned." We're going to do the same exact thing for the engine, for the transmission, and for any other component in the car that's large scale component. And so, what you discover is things get a lot better. Again, when you--when the things are green, you're pretty sure that the thing works. When it's red, you're also pretty sure that things aren't--something went wrong because you took so many variables out and you only have these large scale systems, that you're pretty sure that things are just--something's broken. The thing--the problem with that is, suppose the radio doesn't come on, like, good luck figuring out which part of the radio is broken. The engine doesn't start, good luck figuring out what exactly is wrong with the engine. Like, it's much better than having the whole car and pretending to be testing that, but it's not quite what we want, right? So, we call this medium level test or functional test because you take a single functionality and try to test that in isolations. From a software point of view, this is kind of like taking your app, and instead of replacing the outside servers, like the authentication server, you're going to replace with an in-memory fake LDAP or something like that, which auto-authenticates and so on. So, you basically focus down on individual pieces and you test them in isolation. So then, you say to yourself, "Wait a minute, if going from large scale testing to medium scale testing, we got better at this; maybe we can go in further and go down to individual components. And in the world of software, that's individual classes. So, instead of testing that the engine works as a whole, maybe I can basically have individual test that verify that the piston is [INDISTINCT] of the correct shape, that the oil is present, that the sparkplug has the correct clearance and so on and so forth. And I'm just individually testing all these pieces. And I know that if all of those pieces are correct, then I am very, very confident that the engine will actually start. And if I discover a case where the engine doesn't start, I can always go back and figure out what was the root cause, and I got to test for that root cause. So, it turns out that these kinds of tests are great because they're super fast, right? This is our unit test. From a software point of view, this is where you're testing individual classes. They're really good because you have really high confidence. The tests are really fast. We went down from several hours to run and verify that the product works down to seconds, literally seconds. And now, you can do crazy stuff. You can say to yourself, "Maybe I can hook up my save button. So, every time I save the code, it just runs all the tests because--they're a couple of seconds, what's the problem?" So, imagine writing code where you just code along and say, "Yeah, okay. I think I'm ready to save." Ctrl+S and you know immediately if you broke something or not. It's a nice world to be in, right? And then, really--the other nice thing about it is, if the test fails, it directly points the cause, right? If the sparkplug clearance is not good, you know exactly what needs to be replaced, like there's no question about this, right? So, if a function that is supposed to be doing sorting fails and it doesn't sort properly, like you know exactly where to go to look for the error. Like, in most of the cases, you don't even need a debugger to figure this out. So, this is the promise land of unit testing. So, as I've said, most people when you first tell them, "Write me a automated framework for testing," they'll immediately think, "Oh, I got to pretend to be a user and I got to write some kind of a framework." And we call it a scenario based testing. And there's so many problems with that that I think your effort is better spent on unit testing. If you had unit testing and you have nothing else, you are way better off than if you have just scenario testing. Now, of course, you're better off if you have unit testing and functional testing and a little bit of scenario testing. But for the most part, you want to have unit testing. Now, when they build a car in the factory, here's that something fun that happens. They put the car together, they have individual test that prove that pieces work and they have one final test. And the final test is they take the key, they show them the ignition, they turn and they drive it to the parking lot. If that works, that means a lot of things. That means that the battery got hooked up and it's charged, right? It means that the steering wheel's hooked up and there's gas in the engine and so on and so forth. There's a whole ling list of things that kind of means that it kind of works. Now notice, we didn't prove that all these things work in--under all condition. All are proving to ourselves is that they get to hook up properly together. And that's the purpose of a scenario test. You just want to kind of make sure that things got hooked up properly together, and you have separate unit test to prove that all the pieces work. And you kind of have functional test to kind of prove that individual related pieces work. Like the radio works in isolation, the engines work in isolation, the transmission works in isolation. So therefore, I just want to make sure it's hooked up together, and I'm pretty confident the whole system is going to work as well. So, all of these different levels, as they say, are important because you have all different probabilities that you're going to find a bug, but they're different kinds. As I said, unit testing is all about--it doesn't do the right thing. Whereas in the other extreme is--isn't hooked up properly. And then everything in the middle is kind of, you know--the medium test, they kind of test a little bit of both, but again, we just want to have these kind of extremes. We don't want them test everything at once because it becomes hard to test. It turns out that if you--there's a way to code so that you separate out the hooking up problem from the functional problem. And that way of coding is actually called the dependency injection. We'll look at it in a second why that is important. We kind of already touched on this but I'm just going to cover this again. And that is, you really want to have a large number of unit tests. Typically, the number of unit test is going to be semi equivalent to--or the number of lines of code of test, unit test code is going to be approximately equivalent to the number of lines of code and production code. That pretty much, you know, give or take, you know, in a--in a same ballpark, which also pretty much translates to about a roughly same number of test cases to function methods you have. But that does not imply that you actually want to have one test case per function. You just have approximately same number of test cases. You want to have a--lot smaller set of functional test that kind of test that this sub--individual subcomponents kind of work together properly. They're, you're starting to get more in the business of, "Is it hooked up properly? When I pass this object to this object, does the other object expect together than the correct state?" That's kind of what you're testing over there. And then the scenario test purely is a test in the form of, "Does the pieces kind of talk to each other on how we expect it? Can a server come up in isolation kind of a thing?" We really don't go into the details of replicating things. I'm going to skip to where it is and I'm going to come down to here. So, unit testing. We decided in here that unit testing is a good idea. So, you have a test driver, the JUnit; and you have class under test. And you apply some stimulus to the class under test, you go some methods right on it, and then you assert that something expected happened. Piece of cake? Easy? So, why are we having this discussion? Why is this so hard? What's the problem with this model right here? Yes? >> [INDISTINCT] has dependencies? >> HEVERY: Things often have dependencies, exactly. So, the really--reality of it is that the class under test usually has these other classes that it depends on. And guess what, those things depend on another classes. So, I do something benign. Like, I say "New class X." And then, the constructor of it, it goes off and starts constructing other classes. And those classes in that constructors go often construct other classes, and so on and on and on and on. So, we have the same problem as we had with procedural programming as you pointed out at the beginning. If it's a leaf class, yes, it's a piece of cake to test. Nobody really has to explain to me how exactly I test array.sort. Piece of cake. It's a leaf. But how do I test--I don't know--the log in page?" Totally different end of the things. So, in order to test this thing, we really--we really need something we called a seam. We basically need to be able to take a knife and then kind of cut all the dependencies. And the seam is important because it allows us to divert the execution of the code. This is why procedural programming is problematic or rather static methods of problematic, right? Because if you call another static method, there's nothing I can do in a test to prevent that goal from happening. Now, I'm sure you can come up with a simple case. Like, "But I'm just calling math that absolute value, therefore it's okay." But usually, when you have some static [INDISTINCT] method, people keep adding stuff to it. And so, what started off as a benign method goal which was non intense and non interceptible ends up to be this complicated beast that all of a sudden is non interceptible and all of a sudden that's not so good. So, I take the extreme point of view and I simply say, in my code, I don't have--want to have any kind of static whatsoever. It turns out that in most cases, when I see static calls, I usually look at them and I say, "Yes, this actually belongs on this class over here and, you know, there's something wrong about the old decomposition of the project." For example, let's take the extreme example of math dot absolute value. I firmly believe that the five should be able to say five dot absolute value. Why do I have to say absolute value and passing the five? It should be just a compiler of sugar that does all the magic underneath it. And I believe in languages like Ruby, you can do that. That doesn't imply that the five has to be an object. It just means that the compiler knows how to convert all these things. So, we need to have a seam. So--great. So, we have a seam, and what seam allows us to do is to replace these dependencies with friendlys. Now, when I say a friendly, I don't necessarily mean to mock. It could be a--the real class that I'm already tested somewhere else and I already know that it's going to do the right thing, therefore I'm perfectly happy to instantiate the real thing. But I trusted. I know when the test fails, the problem isn't over there because I have other test to prove that that stuff works. It could be the real thing. It could be a stub such as that it does nothing. Like for example in a login framework, I'll just throw in a stub so that you don't bother logging in anything because it's not relevant for the purpose of the test. It could be a mock which returns in each some collaboration or it could be a simulator kind of a thing that kind of simulates the thing which is kind of like a smarter mock. The point here is not what you put over there. The point here that from a testing point of view, I have a choice to place anything I want over there, and that requires a special way of writing code. If you and your code just simply called a new operator on a class, well, there's nothing I can do, right? Even if it's a--even if you--if there--if anything create an interface for these things, but if you instantiate the implementation of the interface, there is nothing I can do from a testing point of view. So, it is really, really important to have these seams. And how exactly we place seams in some--or inside of our code, that is something that is--that most of us are not experts at. It's not something we have learned in school. It's not something that we have learned through hacking. It's not something you even needed because, unless you were writing test already, why would you be placing seams everywhere? So, how exactly do we need the seams? So, let's back up and then keep the seam in the back of our mind and let's talk about something else. In most of our classes that we have, we have Object, Graph, Construction and Lookup, with Business Logic. Business Logic is the if statement and the loops and the stuff that actually does work, and the Object Graph Construction Lookup is basically your new operators when you're constructing the Object Graph, and it's also your let-me-go-and-find-what-I-need code. Usually, in terms of let me go talk to the context objects so that I can find my property so that I can use the property to open a file and read the parameters that I need, which makes it impossible for me to ever give you the fake parameter in a test or at least makes it really, really miserably hard. So, that's what I mean by object construction and lookup, and then really good stuff happens where most of our bugs are which is in the if statements and the loops and et cetera. In most code, I have seen those two pieces are together, and it's probably the code that you write as well. But it turns out that those two responsibilities need to be separated. You are either in the business of constructing things, building object graphs and constructing the application with all the instances of the classes or you are in the business of being those things that got constructed and doing the actual work. If you separate these two things out, it turns out testing is trivial. Not trivial but really, really easy. So, let's look at how that works out. The little bubble on the arrow represents where the new operator is located. If that little bubble started off inside of the blue class under the test, I could have never, ever controlled the construction of that class. But because I have migrated the new operator into my test, now the test has the responsibility of constructing the object graph, and then I take those objects and I pass them through the constructor of the class under test, and then class under test then collaborates with the things that I've passed in. Now, this gives me a choice in a testing world; because now, I am free to instantiate a subset of the application that I want to test. I don't really have to instantiate the whole thing, I simply instantiate the stuff that I really, really care about. And I have a choice in terms of how I set things up. If I choose to instantiate the real thing then maybe I can, you know, configure it in the correct way. So, if I'm testing a cache, I'm going to instantiate the cache that has a cache size of one so that I'll get misses all the time, right? Whereas in production, I know I'm going to have cache size of 10,000 but I don't want to do that in testing purposes because I--gosh, it's going to take forever to call misses to happen, right? So, the point is, you want to make sure that your code is kind of devoid of the new operator. Because new operator is static and it causes direct binding. And you want to basically say, "I need these objects to collaborate." In new constructor, you say, "Hey, I need the file cache. Please provide it to me." It is not my responsibility to go and read some property file on the hard disc in order to figure out how to instantiate a file cache and then configure it into some specific way or it is not my responsibility to do cache.getInstance and look into the global state variable, which is the instance variable of the cache and get a hold of the cache that way either. I simply say, "I need a cache in my constructor." And one will be provided for me. In a testing world--you know, test world will provide you with a--some kind of a small size cache, which we can go and test. And in production, you'll be provided with the real thing. So, the new operator separation is important because that allows us to do sub-classing. When we can do sub-classing, we can take advantage of polymorphism. And polymorphism is what the seam is. Does that make sense? Now, show of hands. How many of you guys actually do this in your code? You do this? Excellent. So, you know about dependency injection. So, let me ask the question again. How do you write hard to test code? The real crux of the problem is, if you want to make code hard to test, you're going to mix your object creation code with business logic. The moment you do that, you mix those two pieces together, you cannot instantiate anything in isolation. And when you cannot instantiate anything in isolation, the only thing you can possibly instantiate is these humongous chunks of application, which pretty much cause you to instantiate the whole thing. And now, we're back to square one, which is scenario based testing. Kind of--we kind of already decide if it's not a good idea. Wow, I went through this really fast today. So, the take away is that this, that unit--we really, really want to be able to test in a form of a unit test, not as a scenario based test. And in order for us to be able to take unit test, we need to separate the object instantiation responsibility from the actual responsibility of doing work. You are either a factory, which is responsible for creating some object graph or you are the--part of the objects that are doing some work. You don't want to mix the two. All right. I kind of rushed these slides. I'm not sure why. So, we have covered them all in amazing half an hour. So, I'm going to turn into you guys for questions. And if you're being so kind and grab the microphone, that would be awesome. And maybe even come closer so you're not so far away. >> I was thinking I should mention something which is--I was thinking I should mention something which is that some of the dependency injection is barely needed depends on what language you program in. >> HEVERY: Okay. >> So, if you program in Pearl or Ruby, you can often find a seam through the language being dynamic? >> HEVERY: Yes. You are probably referring to Monkey-patching, right? >> Yes. >> HEVERY: Okay. We haven't talked about the idea of global variables. But basically, global variables make your code really, really, really hard to test because things are unpredictable, the order of the test matters and so on and so forth. And it turns out that doing monkey patching--code is in global space and you monkey patching the code, oftentimes results in you changing global state which means if you go and monkey patch a class and then another class runs and you didn't properly clean up after you resolve, then the new test will fail. So, even though many languages like, Ruby, JavaScript etcetera allow you to do monkey patching, whether--if you do monkey patching and a class instance, you can probably get away with it. If you do it on a class level, it usually is a bad idea and it's no different than having global state. And there's a separate talk we do just on all the evils of global state. So, even if you have that, I think dependency injection is still a good idea. And I think--Paul--Dave, are you going to say something? >> Yeah, I disagree. >> HEVERY: You disagree? You think monkey patching is a great idea? >> Yes. >> HEVERY: Okay. Do you want to give your point of view? >> Yes. What you said is exactly right. The problem is, if you don't clean up properly after yourself and the next test runs and then [INDISTINCT] >> HEVERY: Uh-hmm. >> So, if you have a good framework that will ensure that things are cleaned up, then you don't have that problem. >> HEVERY: Okay. >> And genuinely, it's a lot less [INDISTINCT] framework than a DI framework. >> HEVERY: Different opinions. Of course, Dave is the Ruby guy who's the highly dynamic [INDISTINCT] I'm the static type of guy. >> Yes. So, I respect my framework does exactly that. It cleans up after itself if you step up methods to return instances and so forth. >> HEVERY: How does it know which things you modified? >> Because you modify through the framework. >> HEVERY: Okay. >> And it keeps track through everything. >> HEVERY: So, it keeps track whatever comes [INDISTINCT] >> And restores everything when it's done. >> HEVERY: Other questions? So, in Java, you have dependency injection and you have nice fancy tools like guice for what we call automatic dependency injection frameworks. People are oftentimes confused and say, "I'm in C++. C++ doesn't have guice therefore, I--dependency injection is not for me." It turns out that dependency injection is the practice which is asking for things in the constructor. In dependency injection, the automatic frameworks and guice are two independent things. And so, you can have one without the other. And you can perfectly well, use dependency injection in C++ with manual, where you have to write the factories yourself. Great. Well, thanks you guys for coming. See you next week.

Info

Channel: Google TechTalks

Views: 270,073

Rating: 4.8827996 out of 5

Keywords: google, techtalks, techtalk, engedu, talk, talks, googletechtalks, education

Id: wEhu57pih5w

Channel Id: undefined

Length: 32min 7sec (1927 seconds)

Published: Sat Nov 01 2008