Testing Legacy Code Elliotte by Rusty Harold

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome I'm Eliot Herald I'm going to talk to you today about testing legacy code if you have one of these things who doesn't please turn it off so what do I mean when I talk about legacy code I mean I think probably all of us have experienced this at some point occasionally we get a Greenfield application to work on but most of the time the code that we've brought in that we're signed to work with is in fact already existing is in better or worse shape depending on who and how many different people have worked on it over how many years I'd say any code that predates the current team qualifies as legacy code if you've got code that hasn't been touched in living memory how many of you how many of you had to deal with that sort of code this year this week I actually I had something blow up last week when I was trying to get my notes ready that you know as somebody some system nobody understands it hasn't been touched in literally 10 years suddenly needs to be fixed this happens if it's code you're afraid to touch because it's too ugly whether you wrote there somebody else wrote it you know you're afraid your goods fragile you're gonna break something you don't know quite how it works that's legacy code and any code that does not have an automated test suite I think qualifies as legacy code now for any sort of development not just development with legacy code I hope I don't have to emphasize too much the benefits of test-driven development I'm a devoted adherent at Google we use it all the time if you write tests you develop faster you get to market quicker with a product that actually works and does more of what it's supposed to do fewer bugs fewer errors it helps you avoid writing code you don't need often it's when you're looking at the tests that you realize that this provider of an interface provider which provides a method which is in Stan in her anonymous class doesn't act and inject it into the special visible for testing because instructor actually you don't need any of that that's a real example from the code I was working on this week actually I found in a code review when you write the test you can see yeah all I really need is a string here and life gets simpler if you write test it is easier to find in fix bugs certainly when a bug gets reported in an existing product or your current product first thing to do is write a test for it that makes sure that you actually understand what the bug is surprisingly often I find that when I have a bug report against me and I write the test the test actually passes which tells me I didn't understand what the bug was in the first place I missed something I need to go back and look again if you need to refactor code if you need to add things to code if you need to optimize code or improve the performance anytime you need to change the code if you have tests in place they help you make sure that you're not going to break things and that's really important it's a safety net it allows us to develop with much higher confidence if you've got really high test coverage although you're probably not gonna have this on legacy code um sometimes you can get away with you're saying I don't know how it works but it works in the tests prove it I wouldn't try that though with uncovered legacy code now how many of you have ever seen a comment like this one before I have no idea how this works I see a few hands going up I should probably rephrase how many you've not seen a comment like this before no hands are going I'll be one hand is going up - yeah that's a speaker trick depending whether you phrase the question or its negative most people aren't gonna put their hands up no matter what so you can get the answer you want out of the audience I have seen entire systems ruled off-limits because no one understands them and you know manager saying please don't touch that code please don't refactor it please don't fix it please don't have a feature we can't afford for it to break which is all well and good until it breaks anyway because the world changes out from under you I mean this is what happened to me last week in a ten-year-old codebase that I'm not that I'm not even supposed to be touching but I'm the last remaining owner of that code so um so if there's old code you're going to have to maintain it things are gonna change you're gonna have to port to new hardware and operating systems and Windows 10 comes out we need to update on my current main day job project you know we have to update code for Eclipse Mars for Eclipse neon for Eclipse what's the next one oxygen I think coming up and so we have to change the old code that nobody wants to touch that when I inherited it did it even build that was the Google plugin for Eclipse although it did have tests so that was something the external environment can change even if your technical environment doesn't change I don't know how many of you hear most you hear look old enough to remember y2k and some of the old systems we had to fix back then 16 years ago Breck's it's probably going to cause some problems for some people in this audience I'm willing to suspect if you're in the u.s. you may remember sarbanes-oxley or HIPAA depending on whether you were working in healthcare finance or wherever that caused some a lot of legacy code to get looked at that hadn't been touched in a very long time tomorrow I'll be talking about api's and if a third party API or vendor system you're dependent on that you're communicating with disappears out from under you or stops responding or changes its formats then you're going to have to change your code to match so there are all sorts of reasons that code can change out from understand we're going to need to deal with it now can we use test first techniques if you need to work with legacy code even code that doesn't already have any tests or has very small test coverage well the short answer is and I wouldn't be giving this talk if it wasn't yes we can but if you do this you're going to need to give up some of the practices you may be used to when doing test-driven development on greenfield code now we're not going to get everywhere we get if we have a new clean application the first thing you need to say maybe I'm not gonna have 100% test coverage there is mmm this is just not going to happen but just because we can't have everything tested doesn't mean we can't have anything tested some tests are better than none more tests are usually better than fewer especially if they're testing different things exercising different code paths you don't want redundant tests but honestly a legacy code that's never really a problem the real problem is the untested code not the code that's tested too much so get some tests into your system no matter what number to unit tests usually the application already works it's not always true I have seen applications that didn't even build or that suddenly need to be pulled out of cold storage and rebuilt to support some customer but mostly the application is running it's in production it works so you're not trying to squeeze out every last bug in it you assume those bugs were already found and fixed a long time ago your main concern is what are you going to break by accident if you're fixing something else or adding a new feature and for that a broad test an integration test perhaps that covers a large part of the application can be a lot more useful now if it does break it can be painful to debug compared to a unit test that's certainly true but you will save a lot of time and you don't have the time to write tests for each and every if statement each possible code branch each method so try to get some big tests in first also if your no aim it where it ever function you're changing whatever package you're working in if it's in Java and if you're adding anything new whatever it is certainly write that test first with really good test coverage just like you would any other new code write your unit tests make sure they fail right just enough code to make sure it pass repeat until the feature is done now number three frequent context switches when I'm doing new development again I'm right a little bit of test just enough to make it fail right the code to make it pass them a little bit more test a little bit more code and I at a rate like that with legacy code since that mostly already works it's not uncommon to write a lot of tests before switching back to model code particularly if you're trying to cover the existing application then you're gonna write tests and hopefully they're going to pass or if they fail it's because it's a really complicated integration test and you didn't write the test quite right so you've got to make the test pass without actually changing the model code of course again new features traditional test-driven development is just fine that's what we give up what do we keep when we're writing tests for legacy code these are some really important things for any testing number one the tests ought to be completely automated you should have one button tests you know you can just press a button in Eclipse or select a menu item or maybe type MVM test and your tests all run and they all pass there's no reason not to have that with legacy tests just like any other test code if you have any effort to run your tests that they're sort of a manual test suite where you the testers are running through a spreadsheet and checking a bunch of things try to automate those unless you have like testers you know popping in every doorway and I've never seen that nobody ever has enough actual manual testers to do that sort of thing number two any test failure should be blindingly obvious I love it in Eclipse or the very old j-unit free test Runner test pass green bar test failed red bar you know immediately you don't have to think about it you don't have to look at anything the one that I don't like I think does this wrong is maven and sometimes similar tools like ant or Gradle can do this as well you know they'll scroll you run tests and they'll scroll screen after screen after screen of output even if everything's okay generally speaking if everything's okay there would be about one line of output test past just so you know something Ram you only want to see all that output you're never gonna read all of it anyway but you only need it if the tests are fail and then you only need it for the failing tests one thing I really hate about maven is you know having to scroll through everything to find the one test that failed out of you know several thousand tests I've learned you don't control left bank failure is a good thing to search for and to find what actually failed um third thing test Suites should be reasonably quick certainly you don't want anything is gonna take over an hour to run or a day or anything like that that's not good enough to do development work it might be enough for acceptance testing for system testing before release but it's not going to allow you to do test-driven development you want your test to run a few minutes at most so that it's not owner is to run all the tests before you do a checkout to make sure that because you're not just testing the new feature you're added your testing to see what else you broke in some other part of the application where you didn't even realize what you were writing had side effects so try and make your tests quick try and make them not do so there should be one other thing I should add here don't make them flaky this happens a lot if you're writing GUI tests that exercises a UI which can be true for legacy tests that can be very broad the problem is those sometimes fail just for subtle timing issues you have to rerun them a few times maybe sometimes it passes sometimes it doesn't if you encounter flakiness in your tests then try and eliminate it ruthlessly that's one reason we normally prefer unit tests unit tests are a lot less likely to be flaky than big integration tests or functional tests but with legacy code we don't always have that choice technology no changes anything you're using for new code you can use for legacy code J unit in unit PI unit maven great old Eclipse IntelliJ that's also good I don't think there are any silver bullets here for working with legacy code you know maybe some of the bigger frameworks might a little bit more oops sorry I mean back up now where should you get started let's assume you've got a new Legacy Project isn't brought to you if it has tests great make them pass that would be the step one I've seen multiple times I've been given an old project that's been knowing has touched for two years and guess what the tests stopped working for whatever reason sometimes it's not compiling so if it's not even building number one getting the build working number two get the tests working just work through them don't just eliminate the failing tests until you understand why they're failing because there's probably a reason why they're failing something you want to know sometimes it mean you don't care about it anymore but you can't just assume that very often indicates a real and active problem you need to deal with because something's change in the environment that you live in if you are given legacy code that has zero tests not even one line of code covered and I've seen this too step one write one test I don't care what it covers is it a clone method and equals method a two-string method you know all you're doing here is getting something in to make sure the code even builds plus getting your build system set up to do that one-button build so that you can type in vm tests and have everything run through so for your first test I don't care it's trivial as it is as broad as it is it doesn't matter just automate it so you've got something so you can integrate it into your continuous build if you don't have a continuous build that's another talk but also useful then think about what to test next one trick I sometimes use if I can get away with it in a java code at least I write a test for the main method can't always do this sometimes it launches too many threads it brings up gooeys lord knows what else it's doing it's starting servers it may be never exits manukan that's a good place to start because every application is gonna call it and that gives you one really good fraud smoke test to start with so that would be where I would look first if not there well wherever is convenient pick somewhere again don't agonize too much over what you must test test something get some test into place now after your first test is written whatever it is whatever part of the project it's testing get any initialization and clean up you did out pull it out into your if you're using J unit for at before method and your at after methods or at before class at after class or whatever the equivalent is in your test framework of choice in your language of choice then once you've got the setup created once you've got the fixtures you know the fields and the objects created in the class that you're testing how many other tests can you write really quickly in the same test class that are just gonna you know bang a bunch of things out what else can you test with that setup because often the set up is the hardest part of writing the test and depends on the nature of the object if there are a lot of other things you can test quickly if in ten minutes you can write ten tests do it don't worry about testing the hard methods you know the methods that need a lot of weird state setup or that maybe you're accessing the clock or the file system of the database without using dependency injection or anything like that test the easy stuff first because again we're trying to just get as much general coverage as we can as quickly and cheaply as we can this is one way to do it now if you have any existing tests maybe not unit tests but functional tests acceptance test system integration tests whatever they may be conformance test suite even in some cases manual test scripts that the testers follow or they run in using some special tools maybe you can set those up and get them more automated that may also be a very cheap and effective way to get some decent test coverage fairly quickly one thing I've also seen on a lot of projects is you know we'll bring in some tests and some official testers on the project and they'll write big spreadsheets full of you know steps to check and things to check in each release and then the testers get transferred to another project and suddenly the engineers have to go through the spreadsheet well engineers don't like doing that so they sort of stop and the thing doesn't get tested the trick there is to take the spreadsheet and turn them into actual automated tests that run with a one button click using whatever framework you need to use to do that right now I'm using Swit bot because I'm writing Eclipse plugins but if you're using swing or web tests you know selenium for web tests webdriver etc whatever it may be try and automate your tests any tests you already have now when you're converting existing tests you're probably gonna have to mock out some parts you could use mockito easy mock what-have-you just like you would for any other tests you probably won't get it all done that's okay again sound like broken records somethings better than nothing in this scenario you may sometimes be able to write one J unit test that runs an entire existing suite I've actually done that some times for conformance tests I'll have a directory full of industry standard you know conformance tests for XSLT processing for example and I'll have one test method that simply is nothing but driver for all the tests and if it fails well then hopefully I print out at least which of the tests failed it's not ideal ideally I'd want one method for each of those tests but it's still a lot quicker and gets me very good coverage over a large area fairly fast I have occasionally actually found that doing things that way has it found bugs in processors that running the test individually did not by finding bugs that where the processor did not completely clean up after itself after each run so there's advantage sometimes to these really big broad tests that cover a lot of different things later as you have time try breaking it into smaller units if you can okay but that's not usually a high-priority item now one technique I think I've got this this names been around for a while I may have gotten it from Michael feathers I'm not sure characterization testing one problem when we encounter legacy code is that we often find that our we aren't sure what the code is supposed to do you know there's some weird Muffit method that says something like you know calculate the Bernoulli function I don't remember what a Bernoulli function is I sort of remember the name from grad school vaguely how do I know what the right answer is how do I know if the codes giving me the right answer well at this point we should assume that the code is giving the right answer I mean it's been in production so unless we have a known bug that somebody's reported on this function figure it's giving the right answer write a test expect some random value 27 might as well I don't even know if we're newly functions give integers back they're probably real numbers or complex numbers or something but assert that the answer to your function is 27 and then see what string you with what value actually get back you know in the test failure method change your test so that you have the expected value you know the value that was actually supplied and then proceeded to do that for your other functions it's fairly simple technique but it's reasonably powerful okay functional division you've got your test suite setup you've got your test framework you've got a one button build even if it's only running a couple of tests now what do you do depends are you adding new features are you fixing a bug are you just trying to make the code more robust so that you can actually work with it you know doing the sort the act sharpening so before you get down to the real world assume in your act sharpening you're trying to improve your product you're saying I don't trust myself to work with this product yet the fastest gains in code coverage are going to come if you start testing the highest level you possibly can no not down looking and saying this class this method this branch of this loop that doesn't you know there's no way you're going to get a lot of coverage by being that specific on an old legacy application instead look at what the application is doing and try and write an integration test a GUI tests some sort of big broad test for a task for instance its Human Resources application what does human resources application do can you generate a paycheck can you you know terminate an employee can you schedule a vacation can you cancel a vacation can you raise a salary change address etc these are what's the word I'm looking for I'm essentially functional tests they're not looking at how the code is structured they don't care what the implementation is they care what the application is designed to do make sure it can do those things generally by sometimes by driving the UI if there is a UI that can be a web UI you know using webdriver equivalent it can be a glue UI using Swit bot or equivalent it can be if you're really lucky it can be some sort of API be a local client library API or some sort of you know Web API that you communicate with over HTTP or soap or something else like that because that's a lot easier to test then if you have to go through the front end when you have to bring a test through the front end you're in trouble you can do it but it's difficult so if you don't have to go through the front end just pat yourself on the back and say yeah good but whichever way you do it depends on the application you're working with try and get the function done now you've got a function you want to test it number one focus on the main path through the application not the edge conditions I'm gonna tell you a secret now okay I hope nobody here is going to come interview with me because I'm about to give away one of my interview questions this is being recorded this could be a problem I won't give too many details but I almost any interview question I ask often of new grads I'll say okay you've got you know you you've written this method that's going to calculate Bernoulli functions whatever the hell a Bernoulli function is now how do you test it and invariably especially with new grads what they come to me and they do is they say well let's look at the edges let's consider zero let's consider integer dot max int you know and all these like weird numbers and I'm thinking to myself why aren't you testing an obvious number like eight so maybe that's obvious for a Bernoulli function they come with this really optimistic idea that there bugs are all going to be out on the edges they don't have to worry about straight down the middle of the road I'm not a new grad anymore not for a couple of decades at least and I've long since learned that yeah I'll have bugs on the edges and I'm going to have bugs on right down the middle of the road during the most obvious code path possible and most of my interview candidates do as well so start with the obvious simple answer start with things where you know what the correct answer is before you start worried about weird things like well what if we give somebody a negative raise or raise of $0 or something just try and give them a raise of one percent or five percent or five thousand euros whatever it may be make sure you can raise 50000 euros to fifty five thousand euros before you try and figure out what happens when you add five thousand dollars to fifty thousand euros and a salary and see if things blow up as the test suite explains expands as you have things well covered in the middle then maybe start looking at the edge cases if the edge cases come up and practice certainly if you have a bug report that came out on the edge cases that can happen sometimes if there's a bug on the edge but otherwise I wouldn't worry about it too much and that said even if you have a very specific bug that you've been suddenly oh my god we have to get this application fixed and pushed again by Tuesday this is a p0 emergency I would still make sure you've got some coverage on the main part of the application because all too often fixing one thing may break other things in unexpected unpredicted ways that happens a lot if you want to go back and look in my github repo for the last week you know that you'll see a few examples of that but I won't I'm gonna say anything more about that at the moment now then what remember one of the main goals of test-driven development and of just testing legacy code is simply to avoid regressions to avoid breaking things by accident and not noticing that you've broken them because that's even worse me it's one thing to say okay we know we worked all weekend we've got this fix deployed we've run it through the various tests we're gonna push it on Tuesday and then you push on Tuesday and oh yes salary is working but suddenly vacations are broken and the CEO is playing on taking vacation next week so it's good if you can cover more now that's functional division you can also look at your code structure this is not as important as testing by function in my opinion but it is a different way of organizing your thinking about the application so instead of saying you know do I have tests for each function the application forms do i have tests for each package each module each class this some citation I give here this is a really interesting case and we do something similar in code this was an Anatomy though yeah I don't know if are there any MDS in the room I don't see any hands okay if you go to medical school I am told I have not been in anatomy class you are taught how to cut up a body it is something doctors learn and there are certain very precise patterns of cuts that are used and you always do it the same way every time and what these particular doctor discovered is they tried I don't know why but they did a completely different pattern of cuts for dissecting in particular part of the body and they discovered guess what there was a new muscle there nobody had ever seen before and hundreds of years of cutting up corpses because they always cut right through it when doing the standard pattern of cuts so similarly if you dibs on if you cut your tests differently if you organize so that you look at things to test in different ways you will find different bugs so top down start with package for module or Jackson Dom HTML that's a that's a package it doesn't exist anymore this happened about ten years ago the some of the legacy of that I think is probably dead now that um code house is defunct because some of the details happened there all the code still exists in github but that package doesn't and the reason it doesn't is I was looking at Jackson and saying this doesn't have very good test coverage and then I would say okay well this package here this doesn't have any tests I'll write a test for this package just one and this next package it doesn't have any test I'll write one test for that and then I got two or dachshund on HTML I start trying to write a test for it you know what I discovered when I tried tried to test for Jackson Dom HTML I couldn't there was this code has been configured in such a way that it was completely unreachable there was absolutely no way you could create an HTML document such that this code would trigger it was essentially dead code somebody had written it nobody had noticed it nobody had actually tested it or tried to make it do what it was supposed to do so just deleted it if you know it is shocking how often that happens when you start trying to cut through some old code that you find that either code is dead or even worse completely unreachable once you've got a class per package module maybe in Java 9 or other languages whatever that may be can you write a test for each class may be significant class maybe you don't have to write test for each exception class I would write tests for pojos I sometimes hear that people don't you know these getters and setters there although we're doing is this not X equal X this not Y equal Y return this dot X returned this dot y it's too simple it's too obvious we don't need tests you have 9 times out of 10 you're right the 10th time you're not something weird is going on it happens the problem is I can never tell in advance until I write the test whether we're in the tenth time or the first nine and they're easy to write and they're easy to test and even if it's obvious and easy today what that's going to catch sometimes is when things get not so obvious in the future where somebody adds some complicated null handling or removes a field and calculates it from the value of another field or something like that the test will still be there and so even if your model code is too simple to test today that doesn't mean you shouldn't write tests for it because it may not be too simple to test tomorrow anyway if you get down to one test per method on legacy code that didn't have a test suite to begin with you're doing really really well you know be proud of yourself one test per line unlikely one test per branch probably not not on a legacy code but if you've got some tests there you're doing better than most teams are one thing that can help you is measuring your code coverage it'll give you an idea you know how confident can you be if you know do we have 90% coverage so we have nine percent coverage do we have 0.9 percent coverage those are very different places to be more importantly code coverage reports can give you a very clear idea of what you missed you know I would measure the code coverage on legacy projects even if I know it's gonna be 0.9% even if I know it's gonna be 9% because let's say I'm working on the payroll system I want to make sure that the 9% code coverage touches the payroll and isn't just all over in the vacation packet that's something I find a lot actually if I'm looking at some code that was written by some team five years ago often one of the coders on the project was a real hotshot who really loved test-driven development wrote tests for absolutely everything and the other coders on the project not so much so whatever that first coder wrote that's got good tests and the rest of it it doesn't so it may just depend on who originally wrote your code a code coverage report can show that I think I have a yeah here's an example this is act this is Jackson and this is actually fairly high code coverage you're not gonna have this on most legacy projects but you notice that red bar on org Jackson Dom dot HTML that's what I was telling you about right it doesn't exist anymore I deleted it but you can see nothing is testing it so that says if I'm looking at this picture that's right where I'm gonna go and I'm gonna write my first tests because right now it's not covered at all the rest of it yeah the rest of it looks pretty good I mean obviously you can say okay org dot Jackson dot pattern isn't as good as org dot Jackson dot X bro but they're all better than most legacy code would be you can see the law I'm not sure links anyway as far as tools for doing this there are a lot of them Emma is nice clover is payware cobertura is open-source jester's a very special purpose tool I probably wouldn't use it clover is probably my personal favorite in terms of UI and functionality and so forth but cost a lot of money so it's not always feasible depending on your situation emma has nice Eclipse plugin there are several plug-ins for code coverage in Eclipse I'm sure there are some for IntelliJ as well I just don't have two views IntelliJ very much myself and how it works basically it inst well most of these tools instrument the bytecode in various ways add extra code to measure which statements are and are not reached some these days we'll reach in using things like the j JV m TI java virtual machine cooling interface or JVM p i-- java virtual machine profiling interface and they run the test suite they collect the data on which lines are hit and which lines are not hit and then they generate a report and tell you that knowing that a line is covered doesn't really mean it's tested but it at least sort of means it's not throwing a really nasty unexpected exception if nothing else it's not got a nullpointerexception there in the context of the test and if the line isn't even run well then you know it isn't tested the same thing if a class isn't run etc there's the report we saw that already here's the package level coverage classes in the package line coverage not so good I actually did a few years ago I went through this particular project this package and went through a lot of the expressions and almost everywhere I looked where there wasn't a test there was a bug and there was a lot of weird floating-point arithmetic and other details to be dealt with and it really was an example of untested equals buggy so it's mostly all cleaned up in these days now if you have a lot of legacy code you may get the idea of auto-generating your tests if it works for you great I wouldn't I've never seen a tool that I would you bother installing or purchasing myself to do this I might use reflection to just run through and find all my public methods and convert them into test methods and then you know I can sir to either comment to fill in the test code or a failure statement depending on how serious I am about filling it in but this is not hugely useful in my experience it maybe get you started avoid some boilerplate that's about it I wouldn't push this too far one thing I absolutely would do this is not technically unit testing but this is really useful and and it's cheap and it's relatively easy static analysis of your code especially an old legacy code base you don't understand run finds bugs over it run PMD over it in Java if you're in the C++ world GCC - W all there are many other tools of this nature the first times you run cou your static analysis tool you're gonna find a lot of things including some real surprises some things like how did this ever work how did anyone miss this oftentimes the reason they missed is because the code isn't actually executed it's dead code other times it's a serious bug that's actually in the system this is worth taking a look at it's one way - it's a third way to really dig into and try and understand the legacy code base which is an advantage I haven't really pushed but it's very true of working with legacy code now just to figure it out if you're writing tests for it you're figuring out your goal you're learning but back to static analysis as I said the first time you run this tool you're gonna find a lot the second time you run it after fixing the things found the first time not so much this does not give you you know you run it ten times you're not going to get the same amount of benefit from it each time the first time is by far the most useful it's where you're going to find the most serious most obvious most astonishing how did anybody ever miss this bugs the time this is not gonna work is if some other program or some other developer before you already went through this and already ran their static analysis tool I don't know find bugs has gotten a little better over the years SSP MD and other tools so maybe they have new checks they didn't have five years ago when this was last done but generally speaking though if it's never been done before in a particular codebase it's definitely worth doing it will find some real mistakes okay now another question that comes up legacy code test or debug particularly if you've got you know your something I've been assigned to this p0 bug that popped up that somebody noticed in production they are going to be bugs that's a super fine what do you do first do you write a test or do you start stepping through the debugger I think you can guess where I stand on this before I try and debug the code I may look through the code to try and understand what I should be writing tests for we're in the system is the bug likely to be hiding but then I will start trying to write some tests that expose the bug when I can write a test that fails because of the bug I'm confident I understand the bug until that point not so much if I sometimes depend on how the code is structured I mean maybe I know that a certain method is supposed to return a name and instead it's returning a social security number or whatever the European equivalent of that is and I don't know why it's doing that but I know which method is doing that so I'll write a test for that method and I'll see if I can force it to push out the wrong data if I can't do it maybe they'll started the test and step through that in the debugger to teach myself what's going on in the code that's often a very useful way to understand complicated legacy code it's especially useful the code is written with a lot of interfaces and abstract classes and abstract types everywhere at least in Java because if you try and you know you say you've got X dot do something and you hit f3 and Eclipse to jump to the Declaration of the do something method well if you find yourself in an interface that doesn't actually have any implementation you don't know where the bug is you may know do something's not doing what's supposed to be but where's the real code if you step through it into the bugger often starting a test because that's often a very useful place to start then you can see the code that's actually being executed instead of abstract types and abstract classes that's one of the things I like least about interfaces and abstract methods they just make debugging really really painful in my experience okay three questions when you're faced with a new bug in legacy code one simple obvious and local you know sometimes you look at and almost by inspection you immediately know where the bug is it's really obvious number two do you understand the code around the bug once you've found where it is you know how is the class being initialized how are all the values being set up what are the dependencies on the environment number three so when you answer yes to the first two do you understand the fix if the answer to all three questions is yes okay fine fix the bug otherwise if the answer is no I would probably spend some time working on my test suite before I actually try and fix the bug two goals in mind again one just get more safety that I'm not gonna have unexpected side effects and to help me understand the code that I'm suddenly tasked with better because as I write tests for it I will learn about it refactoring this is dangerous on legacy code bases whatever kind of refactor you're doing just to make it more comprehensible so you can understand it refactoring can improve code so it's more testable but if you're trying to do something like introduce dependency injection into a class that's got a lot of tight coupling with data bit with JDBC or hybrid eight or whatever you really you're taking a big risk of breaking things generally speaking to refactor with confidence requires a good test suite first so it's really hard to refactor your way into testability it can be done but you're probably gonna break things as you go make sure you've got a manager who's not going to go ballistic if things don't work out as they're planned um be careful most automated refactoring tools at least in Java where I have the most experience there early reliable so if you want to do something like rename a method rename a class rename a local variable okay go ahead do it you know pull an inner class out to an outer class be careful if you're touching the public API for example some of the code some legacy code I'm currently dealing with has lots of classes that appeared to be unused and that appear not to have any dependencies on anything else unless you have to know that way off in this other directory completely separate in the source code repository there are hibernate config files that reference them and actually about jbpm config files that reference them and about six other kinds of an XML files that the refactoring tool can't see that it doesn't know that they're being loaded up by reflection or by this other system so watch out for that if you think I don't know if everybody could get away with this but try and make sure you've got all the code potentially touching your repository checked out and if you've got a class you think you can get rid of or rename in a system like this you know foo bar Baz just do a grep through the whole thing to see where else that string pops up and you may be surprised on occasion that it's actually being used in unexpected ways so watch out for that okay summing up some tests are better than none by all means get some tests written please if you don't have tests the codes broken that's the short version broader tests are going to work better for legacy code than narrow unit tests because we just don't have time or energy to write all the possible unit tests even the understanding it's easier to understand the broader test it's easier to understand the functional tests and say we know that this method is supposed to generate a paycheck so let's see if the PDF that comes out the other end looks like a paycheck whereas if you're supposed to the method is supposed to calculate a Bernoulli phone you don't know why you're calculating Bernoulli functions or even whether the method is being called that's a very different situation so I'd go for the broader test I'd go for the integration tests unless I know exactly what it is I'm trying to fit fix most importantly of all do not let the perfect be the enemy of the good I often hear this so we can't test that it'll take us forever will take you forever to write one test will take you forever to write two tests you know right the test that you can write right the tests that are easiest to write right the tests that give you the most bang for the buck and that will catch a lot won't catch all problems it's not going to be as reliable as a fully 100% or 99.9 percent code coverage and a new project but it will help a lot it will avoid a lot of embarrassing mistakes later okay if you would like to know more this presentations online and probably Google it already I expect there's an article developerworks I wrote a few years back about this Michael feathers wrote a book about this called working effectively with legacy code this was one of the standard noogler books at Google for years I don't know if we still give it out or not but in many ways this is a book about how do you write unit tests for C++ code is another way of thinking of it but at the time I think he was thinking of very much all C++ code is untested and legacy code okay various tools available there and we have time for a few questions and I have books to give out for people or close enough for me to reach if you have questions come on you want a book yes right right mmm-hmm that's a very good idea the suggestion was could we automatically generate characterization tests you know create the test run invoke the method and then you know get the output I haven't tried that I suspect maybe you could it would be worth experimenting with depending on the nature of the code under test if it did not require a lot of complicated setup you could probably save yourself some boilerplate by doing that I would expect it would be easiest if you allowed yourself the flexibility to fix things up manually and even allow the generation of code that doesn't compile under the assumption that you're going to edit it and check in the edited code into your repo but it would certainly be depending on the nature of the code particularly depending on how easy it was to automatically figure out how to invoke a method that would be an interesting thing to do okay other questions yes second row I think I should have waited go ahead okay very good point yes Brod tests GUI test integration test often runs slowly so a question is slower or faster how slow is too slow I can tell you in on my current project which is Eclipse plugins we have a number of broad GUI integration tests that bring up an instance of Eclipse you know pop open windows click things move things around crate files and are on the continuous integration server on Travis that usually runs in under 10 minutes which is acceptable it's not ideal but it's okay I would be a little more worried if it was more on the order of an hour right I am a big question is what if you run it overnight and see the day after did you break something to build I will sometimes run extremely big test Suites overnight or schedule them over the weekend I some of the common code I work on and things like guava if anybody has used guava those changes will be tested against essentially all Google code the entire Google repo every single test that's touched by a change to say you know Common dot base dot whatever so that's a pretty that includes big tests small tests and so forth but I still won't check it in until those tests pass so I would say yes if you have to if you have a really big test that run overnight but try to avoid that because you can't check in and you shouldn't check into the repo until the tests pass you always want head in your repository whatever that is to be a passing green state that's really really important as soon as you allow broken tests into your repo you're in trouble whether you know they're broken or not because then other people are checking things in maybe they this is sometimes me often I think oh it's just a flaky test it's not really I'll rerun it again and sometimes I have to rerun it multiple times so I realize oh wait I really did break something it's not just flaky so as soon as you let it get to red once your hosts don't do that okay yes front row test Suites I didn't mention Sona or sonar q no I'm sorry I didn't I'm not tools I've used myself on my projects there are many tools out there no I I just I'm not familiar with it one disadvantage of working at Google is we work in a very sort of insular ecosystem where we're not necessarily using the same tools as everybody else so I could tell you about other things but they're not things you could use that's change a little for me last year when I'm doing mostly open source stuff mm-hmm okay so the comment is that there are lots of tests that are tools that can automate your tests not sure what do you mean by automate exactly or yeah absolutely okay so what you're talking about is essentially a continuous integration server I'm currently using Travis's for on our github repo there are many others you want to go even a little bit further if you can get away with this Google does this I'm not sure how many other places do it's less common is you don't actually what you do is you sin you don't submit directly to the repo yourself you send to the continuous integration server when it passes it submits it runs against head and submits for you that's called a submit queue if you can set such a thing up I recommend it it avoids lots of problems where your tests pass but maybe somebody else check something in and pull requests you know conflicted in flight or something we had that happen on our Eclipse tools repo a couple of weeks ago with a you know again to pull requests one deleted some code the other depended on but they both went into the queue at the same time we weren't using a submit queue so I didn't get caught until the next pull request where things broke unexpectedly for reason that nothing to do with that pull request because Nabil and I had both submitted it about the same time yes mutation testing for legacy projects you mean like fuzz testing right fuzz testing is another talk I have tried it it is it can be useful I mean it's more commonly used as a way of finding security holes and security bugs I would not think of it as the ideal scenario for testing legacy code it's more of a if you've already got really good test coverage and you want to find what you missed you know if you're if you're running 99% but you're not sure where your branch coverage and your switch you know what special values might cause you problems then that's where that sort of testing would come in very handy but probably not so much for legacy code I would think I've never tried it on this sort of project yes okay so the concern is your manager says it takes more time to fix the test Suites than it does to make the code change right mm-hmm I would make the argument well there are technical ways you can make your tests more robust in the face of non important changes like changes and algorithms or changes in data structures but where the functionality of the code is essentially the same ideally that should be the case but and you want to avoid writing your tests so that they're heavily depended on the implementation you want to write them only against the public API and the published API if you can possibly help it I've slipped a little bit on that in recent years but I still believe it and I would say you know the real argument to make to management is I can make a small change cheaply but the bugs that are going to be caused by not having tests are going to be very expensive you have to be a little more polite than that and word it out a bit but that's that's altering what it's about it's about avoiding regressions avoiding unexpected unexpected bugs avoiding unexpected rollbacks if I give you a book that's a very that's a very good question though so if somebody can pass it back ok maybe one my thing we have time for one more yes over there right right well that's two issues one is the brittleness of broader tests and then the debugging of them and then in a Greenfield application idla would prefer to have more unit tests you know more unitary tests certainly and they are much easier to debug if they are smaller that said the speed of debugging is still not that bad compared to trying to write unit tests for everything and one thing to do once you know that a big test is fail and you can try and break it down and write unit tests for the pieces of it maybe find it a little more clearly that way but the other question is the brittleness of broader tests yeah they can be and that you sort of have to depends on the nature of the test fins and why it's brittle for instance is a gooey test there can be timing issues sometimes with for almost fundamentally flaky things you can rerun the test multiple time say if it passes one time out of three you're okay that's something I've done in the past I feel a little dirty when I do that but you know if it I have to I will okay I think we're time's up going to the sign out here so thank you very much for coming I hope to see some of you tomorrow morning 9:30 for effective web API design and I have it looks like eight more books which I will be happy to sign for whoever gets to the desk first if you don't hurt yourself on the way down
Info
Channel: Devoxx
Views: 12,548
Rating: 4.9354839 out of 5
Keywords: DevoxxBE2016
Id: cjxXv0eifhY
Channel Id: undefined
Length: 60min 49sec (3649 seconds)
Published: Thu Nov 10 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.