If considered harmful: How to eradicate 95% of all your bugs in one simple step - Jules May

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay I think we should make a starts cuz we got a lot to get through today can you hear me at the back is that the sound reinforcements working okay good this is a very big claim I know this and I know that I have to prove to you that I'm as good as my word so I'm going to make your promise in the first 10 to 15 minutes of this talk I'm going to convince you that this is true it's going to take you the rest of the talk to show you how to do it but I'm going to show you why this is true that was a slide that made me put in now let me tell you what what the history is behind this this talk and this this stuff I was working on a code base this was a few years ago now which was absolutely dialed with bugs it didn't matter what we did with it every time we'd plug to knock one down another one popped up in order to try and take control of this this is about two and a half million lines right so it's not a major program but in not to try and cope with this we decided that we were going to delete our bug list and just see which ones popped back up again within six weeks we had a list of 4,000 so that gives you an idea of what the problems were so everybody was saying how do we get rid of these bugs and I thought actually we've has a more important question to ask where are the bugs coming from not how do we get them out but how are they getting in there in the first place so this is what we found this is basically the the scale of the problem we were working with what we found was that when we looked at admin bugs when we went back through the version control when we look back at every fix when we look back at every bit of history going back about five six seven years on this whole project with a bit patchy in the beginning but missed about five years of data what we found was that the bugs fell into a certain number of categories the basically that there was a bug pattern that kept on showing up over and over again it was the same one every time now there were some bugs that were there just because we hadn't understood what we're trying to accomplish although that we haven't understood the framework that we were working with but fundamentally 90% of our bugs all what of the same pattern order of the same type and they're all there because somebody was putting them in and I thought I oh that's interesting can we understand that pattern in more detail and it turned out that yes we could what we did next was we then drilled into our data in more detail we said well can we find the bug that were put there and were discovered at the moment they were put there and can we find second level bugs these are the bugs that we've introduced as a result of some other bug fix that was going on and this is what we found these primary bugs they were about 90% of the total the secondary bugs were about 99 percent the total basically what we saw was that our entire technical debt was being called by one single anti-pattern I don't know how many of you have read the mythical man-month Fred Brooks absolutely classic book one of the things that he says is that when people work on bug fixing they find every team has about a constant bug rate it might be one but a thousand lines one but for 4,000 lines if you're really unlucky is one Bell per 50 lines which is what this was running out but the point is it doesn't matter how much fixing you do this bug rate stays constant and one of the things that he was asking was well is it because all of those bugs are already in there or because you're introducing them well what this data showed us is because we were introducing them every time we introduced - every time we fix the bug the nature of the bug fix introduced new ones so pretty much 100% of our technical debt was by this one ante patent I bet you want to know what this is I don't get to tell you but not right now first did this work first we need to talk about this chap this is Edgar Dykstra Edgar Dykstra was a very clever chap he was a computer scientist before there was even a word for computer science and an awful lot of what we take for granted today this guy invented back in 1968 he looked like this much much younger and he wrote a letter he was famous for writing these letters he would send them up all over the place they're all collected now and a librarian numbered but here at this letter a case against the go-to statement this was back in 1968 he wrote it to the communications of the ACM and in it he said something which for the time was very revolutionary he said he has noticed that the more bugs a program has in it directly correlates with the number of go twos the program has in it and he said go two's cause bugs this is this is his paper the case against the gertie statement I'm going to explain more about what's in this paper because it's a short paper but it's fairly heavy computer science and but what he did was here it's been not just that this correlation is true but he explained why it was true and he explained what to do about it anyway he sent this off to the CAPM who would which was being edited by a new class of eard who was the guy behind Pascal and Viet said great paper lousy title let's do that to it and the rest of course is history and you can look it up online right there now what does he actually say because what the principle behind this paper is actually pretty important he said that when we look at a program then what we described on the paper when you talk about a program we talked about a thing right it's you know you can print it out on paper or you can put it on a stick or you put on a server or something like that but we talk about it as if it's a thing right but what it actually is is not a thing at all it's something quite different so what he said was that the program looks like a static map it looks like you know it's fixed it's on the paper it's fixed on the screen but what it means is a dynamic process it orchestrates the all the processing that's going on inside a computer somewhere but it makes no sense to have a static program because the program inside the machine is inherently a dynamic thing when we talk about a program the language that we use is this sort of spatial static language so we talk about right here this line here certain such has happened or what about this variable here of this class over there and we'll even when we're talking about class structures which don't even correspond to things on the page we will actually gesture into space and so this class over here and it's you know the data is weaving like this all of our language is spatial and Static and yet the program has no concept of that at all the program's concept is of time the building blocks of a program inside the machine the real thing that you represents is actually a sequence of instance one after another now as people as human beings we are really good at understanding this spatial the spatial stuff we're good at following maps we're good at finding our way around cities compared to other animals we're very good at it so it's natural that we use this language but as humans we are really bad at understanding this we all hate change even if you don't admit it you hate change but you find that when you put people into geographies which are moving all the time they get lost they can't find the way around they actually feel sick because people don't understand this sense of change so what we're trying to do is control something which is inherently dynamic and changing which we don't understand using something which is inherently static and locational which we do understand and no wonder that's hard so what the extra set was this if this is well yes it has he says what we need to do is we need to find some language there's no point getting rid of this because this is how we think but can we find some language in which we can express both of these concepts at the same time he called it an execution coordinate so he said can we find something that we can speak about which not only has this statics facial feel to it but even while it's static also expresses this dynamism and what he came up with was an execution coordinate and the execution coordinate lies at the heart of what we now know to be structured programming so let me explain this let's take a really simple program can you read this at the back because I normally have real problems with the projectors and being able to see this what we have here is a really simple program it's the kind of thing that you've seen time and time again it's probably one of the simplest examples of anything that you're going to see at this conference now we can look at it this is I'm pointing to it I'm saying here's the program here's line 13 I'm using this static language right but what we can do is we can animate this program we can say what happens when it starts to move and what we find is that it executes each line in turn and and it executes each line we have the sense of line but it executes each line at some instant in time what we've got along here is this time axis so and what we've got what this picture shows us is a correspondence between the location in the program and an instant in time so in Dijkstra's terms this is a good execution coordinate because the one thing say line 13 represents both the position in the program and that instant in time so that that works okay now I want to explain in a bit more detail what his concept of an execution coordinate was so again to animate this again and we're going to stop it at line 13 so we've we've stopped a progress through the program but we've also stopped our progress through time with we've stopped time and we can see that we have this sense of being at line 13 so we can say here is the location of the here is where we are is where our in space here we are here's where we are in time I'm sorry I know I'm labeling this but it's important now print is a function it's not merely something that we do atomically it's got structure inside it so what happens is as we get drill into that function then we're climbing up the stack with finding other coordinates elsewhere in the program this you understand you've seen your debuggers produce this all the time you can see stack traces you can see where you got sit but this was part of Dijkstra's idea of an execution coordinates this is what you need is not just one number it's this tuple of numbers which says not just where you are but it tells you the history up to that point it tells you how you got there we don't care about the history along here because the only way to get to line 13 is through the intermediate lines but we do care how we got to this point because we can get to that point through lots of different paths so that the execution coordinates involves that whole tuple does that make sense as I say this is not anything new this is not anything that you haven't already seen before this is this is what all of your debug has give you there's one more step that I want to show you which is possibly a little less familiar and that's what happens when you create a loop now Dijkstra said four reasons for computer science in mathematical reasons which I will tell you if you want but is not worth going into now he said we should treat these loops as if they are function calls so we're not going round and round the loop what we're doing is we're entering and exiting and exiting and entering and exiting the loop like this so this is where we've entered the loop then we start drilling into the loop and we can see we've created a stack to represent this and we've kept a tally of how many times through the loop we are our debuggers don't do that generally I wish they did because this when you put all of those three things together then you get a complete sense of where you are and how you got there okay this is an execution coordinate now what Dijkstra said was you can understand a program when every point in the program that's every point in its this textual description and every point in its execution has a coherent execution coordinate that's when you can understand it and any program for which that is true is a structured program so let's see what he was fighting against this is remember 1968 when the Pascal was still at Winton Viet Tsai and most people wrote in fortune and and this was typical of the kind of code of the day can anybody tell me what that's doing anybody now this is about the simplest example of unstructured code that I could come up with but seriously I have seen code worth the minutes I've debugged code worse than this I've inherited code worse than this that they said can you structure lies this into c-sharp and no I can't would it help if we animated it let's have a look well what's going on here is we can see at each point what line we're on and we can keep a track of what line we're on but when we look at the time well line 14 could be this instant or this instant or this instant and the history could be this history or this history or this history we just don't know so in this example of code there isn't an execution coordinate we can measure the line but just knowing the line number doesn't give us the execution coordinate because it doesn't localize the time does that make sense okay now this is not hard to understand I'm saying things which should not be challenging you I mean you know you would never write code like this you know if you've even seen her like this you know it's been consigned to history a dustbin and quite right too because this is awful but if we were still writing code like that so I'm sure you'd agree it would be riddled with bugs we'd never be able to control it well what if I told you we still are let's just do a review of where we got to so far the execution coordinate is this tuple it's a sequence of numbers representing a stack each represents a specific location in the code optionally with an iteration index okay so that's what an execution coordinate is that execution caused us to that execution coordinate choses if its structured where we are in the code where we are in time and the entire history leading up to this point which means if you've got that number you can roll back wouldn't that be great if a debuggers did that instantly if it's not structured code we lose these two finally just because you've got a go-to in the code doesn't mean that you're violating structure the mathematical foundation here is saying if you want if you want the code to be structured if you want an execution coordinate then you have to have this relationship between the time and location but that's all you need so we have go twos in our modern code we call them things like return and continue and break what these are doing is precisely coming out of the flow of the code and jumping somewhere else but they're doing it in such a way as not to violate this execution coordinates even assembly you take a chunk of code you can pile it down to the binary then well the binary might be I mean there's no concept of a loop in binary is all done with loops and branches jumps and branches and yet it doesn't undermine the structured mess of it because you can still pass it through a debugger and you can connect it back to the original code and see where you are in the original code you haven't broken the structure just by putting go tooth in it what we have done though is we've cut we've controlled our programs by producing limited versions of these go twos we've tamed the go-to rather than allowing them free rein we've tamed them okay so that's the history now these days we write code more like this and in this code anybody not sure what this is doing anybody is sure what this is doing it's typically the kind of code that we write but there's a problem with this this is not structured not in the mathematical terms that he has asked for the reason is that when we execute this code we step into this block and then we execute some of the lines and then when we step into the block again we execute a different sequence of lines which is not a problem because if we know that we're here or if we know that we're here and we know that we're in this block then we know a lot about how we got there but the problem with this is that which branch we're on these are linked together because they've got this shared condition so if we find ourselves on line 17 then we know we must have passed through line 12 this picture shows us this and certainly if you're on line tip line 19 we must have passed through line 14 and the picture showed is that but it means that when we actually try and look at this in these these execution coordinate terms we can't tell what the history is from here it could have been either of these histories but we've lost our sense of this clear unequivocal history exactly as what happened when I showed you that 4/20 code now this I'd said I'd tell you what the pattern is what the anti pattern is this is it the same condition repeated in more than one part of the code because the problem is that there's nothing obvious connecting those conditions together sometime in when they're very close together you can spot them when they're very close together it doesn't really matter because things can things can get it wrong and you can spot it right away and put it right but if they don't look the same if these are not even in the same class so they're not even in the same machine then you haven't got that relationship and frequently what happens is when these things get separated they stop being synchronized so look what a goto does normally you look at a series of lines and the lines go one after another and if they're adjacent in space then they're going to be adjacent in time but what go-to does it says no these are not no longer adjacent in space we've got these two things which are separated which actually are adjacent it's like a wormhole in your code right they call it spaghetti code because it's it's joining these two things together but what this is this is wormhole code these two lines and these two lines are sort of in contact they gear together like that right so these separate bits of code are in some kind of contact with each other and yet they're separated in the in the code they're separated could be separated in the assemblies does that make sense now let's suppose these were separated and let's suppose you're going to come along and make a fix you're going to change some function of this let's suppose instead of having overdrawn and in credit he wants a special code for zero that's where you going to go and fix it and how do you know to go and fix it there and the answer is you don't and chances are you won't and that is where the bugs come from that 90% of bugs that caught the rule about that one empty pattern this here is that anti pattern it's having that one condition expressed twice and controlling two separate bits of code does that make sense any questions anybody okay now can we fix it like this now what we've done here is we've taken that that condition out and we've put it in one place so that we're separating out the things that we you know we're separating into two separate paths now this is structured this is fine except for the fact that we've got this repetition tell me who thinks that copy and paste is a good way to fix bugs didn't think so okay so this isn't the answer either I mean we can take this and we can refactor it into this and in some ways it's better than some ways it's worse but this isn't actually the answer that we're looking for incidentally that should have been better how long have you been looking at this code that's what it should have been that was incorrect I've spent the last what seven minutes with that code on the screen and you all missed that back I rest my case okay now let's go back over what we've just seen this is go twos this is not coming out on the show very well this isn't something gone wrong with the show go twos destroy temporal identity we've seen that because you don't know you know where you are but you don't know when you are and they also destroy the history over here this should be if yes what ifs do is they conflate the temporal identity I'm really sorry I don't know why this is misbehaving what go twos do is they create flow spaghetti what ifs do is they create these flow wormholes and it's not just if it's these synchronized it's they create these wormholes okay I'm really sorry about this I don't know why this is misbehaving so badly new computer it's an apple it should work when isn't it working okay in order to create structured cody's and go tunes then what we do is we have to not explicate go to we can't say go - we can't spell it like this what we have to do is we have to imply the go tunes from the shape of the code right and that was the solution to the go-to problem so how are we going to solve this problem well the answer is we have to imply the decisions from the shape of the code and what we're going to do is we're going to talk about how we're going to do that incidentally there is something that I want to point out here this way you can't see its eyes go to this way you can't see it says if and if but look at this go to in and of itself doesn't destroy structure that go to does one or two things if either permute the order of the lines you execute thing but doesn't necessarily destroy the temporal identity it might scramble it a bit but it doesn't destroy it or it creates a loop and we know how to handle a loop so go to in itself isn't the bad guy the problem comes about when you have a go to connected to an if because then sometimes your branch and sometimes you don't I reckon that the wrong person oh I reckon the wrong guy has had to blame all this time I don't think that it's go to that's evil I don't think that is go to this handful I think it's this go to that's harmful and between if go to and death if do you see a common pattern here so if so the problem okay now I promised unite at the beginning I said 90% of bugs come from this pattern have I convinced you is anybody not convinced that that's true thinking about the end of your own code that you work with thinking about the bug that your encounter is anybody not convinced that that's true that's pretty good okay that's like no hands winds up so now I'm going to convince you the other half of it how many of you do debugging in fact how many of you don't do debugging ever so all of you are gonna know what's gonna happen I'm going to say here what do you do you run you you understand what the conditions are the create the malfunction you run the code and you step through it until you find the malfunction happen and then what do you do you go oh look we've got a null pointer here so you put in a little condition to test for that special case that you took right there and then move again move away again what have you done you've just put in and if if because what you've got is some condition somewhere in the code that has created that pathological condition is that me that would have been so embarrassing for me it's really about it for him now yep so what you've got is you've got this estate you've got this condition somewhere in your code which is creating this pathological state and then somewhere else you've now got a new condition to test for that state and patch it up again which is precisely the synchronized it which leads to but leads to boats because next time you come around and this bug here or this pathological state is affecting somewhere else now you gotta fix it over there and what happens when you come to maintain it so now you've got these little conditions these little tests spreading out all over your code all connected with the bug fixes all where you think you're making your program more reliable every one of them introducing new defects if they don't you know the different wiener malfunction or defect okay if you're driving down the road in your car it's okay nothing's wrong the cars working and then suddenly okay what's happened is something is now wrong with the car the torsion bars broken or the tires gone flat or the linkage is broken or something like that but there's now a defect in your car which is calling them our function so you go in and you correct the defect you you pump up the tire or you or you mend whatever's broken and now you can drive the wagon point is you've got these two things you got the malfunction which is how you know there's something wrong and then there's the something wrong which needs fixing and that's the defect now in software things don't break in software things go wrong because you put things in them that they're wrong right but how do you know there's anything wrong with them well the answer is there's a malfunction you try and do something in the program you expect it to behave in a certain manner it doesn't behave in that manner so now you've got a malfunction so now you trace back and you see if you can find the defect which caused the malfunction and then you fix the defect and hopefully the malfunction goes away in fact you get two new ones it's like a rat splatters in it but you need the defects to create the malfunction every malfunction is associated with the defect and the defect plus the malfunction is what we call a bug but not every defect creates a malfunction quite a lot of defects sit in the code for years where you can't see them they don't do anything they're not they're not triggered in just the right way to cause the malfunction so they're just sitting there waiting for you these these multiple ifs those are precisely defects they're not malfunctions they're malfunctions in waiting they're trolls under the bridge okay so what we're trying to do here is trying to reduce the number of defects so again to go back to the promise I said 99% of of our secondary bugs were all caused by this one ante patent it was a bug fixing efforts that will cause in our code rot that was causing a technical debt does anybody not believe that does anybody not believe that of their own codebase have I delivered my promise don't say yes chill somebody thank you it is an apathetic affirmation but I want to be really I mean you know this is not just me saying oh good come on beat me up I want you to be clear about this I know I'm claiming something big and I want you to be sure that I'm as good as my word so have I made my case is there anything that you want me to go over do you want to challenge me now's your opportunity okay good good we need to fix this we need to fix this by more than just going back and patching over our bugs we need even to do more than go back and find the original defect that causes a malfunction rather than introducing a new defect what we need to do is we need to find a new concept of decision structure in the same way that back in the 60s and 70s they were looking for a new concept of flow structure this is what we're going to do today with the rest of today this is what we need to do we need to concentrate our conditions we saw that the problem came about because we had two conditions testing fundamentally the same state in two different places and I'm going to talk a bit more about exactly the nature of that problem but if we can take that understanding of State if we can take the decision in just one location and then find the find the consequences of that separate separated I mean what we see now sorry I'm let me find the way that's pressing this what we do now is when you say if and then then what you're doing is you're saying here is the decision and here is the consequence and they're both in the same place so if we've got consequences in different places then we have to replicate the condition we have to replicate the test but replicating the test is what causes the problem so we need to find a way to separate the test from the consequences and concentrate the test in one place and then have the decisions all dependent on that test up there does that make sense this is our new concept of flow control and it's not as hard as it looks so we need to concentrate our condition the decisions that are based on those conditions need to be implied by the shape of the code we need some new mechanisms to represent this and the same way that we needed new mechanisms to represent go to we need new mechanisms to represent if go-to was replaceable and this again was based on a mathematical theory it was replaceable by function calls by loops and by a switch by if and then and that was all you needed and you could build a whole in structured programming earth of these three constructs I think similarly there are three constructs that will replace it and I'm going to explain to you what these are today well oh now this is the other point so all very well to talk about this back in the 60s and 70s they were inventing new languages like C and Pascal and daolon but we haven't got that because we have to write code today so we have to do this stuff in existing modern languages we can't wait for the new languages to come along again back in the 60s and 70s they didn't wait for the new languages to come along what they did with a devout new kinds of discipline and new kinds of program hygiene a programming hygiene and they learned how to use these techniques before they made it into mainstream languages so we're going to be doing the same thing right let's talk about assert what is this stupid computer doing ok got a bit of code here you can tell what it's supposed to be doing can't you there's a problem in that in fact there are several problems in that anybody care to hazard a guess so your complaint is that a might be negative ok all right so what do you expect to do about that I would assess on it yet just like that now I've expressed this in this way because I'm saying that I want to say that I'm making the same this is what I want this is what I do this is what I think is a valid call and if it's not valid call then this is the consequence so that's why I'm throwing the exception I'm being explicit about the action that I'm taking here and that's the point I want to make here anything else wrong with this be happy with this now actually I'm a senior numeric at all but fair enough so what do we do about that now this is a slightly different kind of assertion we should be able to tell at compile time would not be able to tell at compile time what the value of this thing is but we should be able to tell at compile time what it's type is and if the type is wrong then we don't want to throw an exception once it starts running what we want to do is send a message back out through the compiler to produce a compiler fault so what we're doing is we're sort of extending the concept of type right into the compiler itself so the reason for showing this is that we actually have two different consequences we've got different things to do when the assertion failed normally what an assertion does is it just produces an assertion failure exception but I'm saying actually we want to do more that assertions can be worth more than this but it means that if we do this particularly if we've got things like this then it means that that takes a whole lot of testing out of the execution it means it's the compiler looking for us so it means that at the point at which we start running this code we can be pretty sure that it's numeric and if we can actually do that testing at compile time then we can be pretty sure that it's within range as well don't know if any of any of you have handled languages that use contracts things like Eiffel Eiffel tries to push as much of the evaluation of the conditions into the compiler as it possibly can so that when it starts executing it doesn't need to test them anymore so that is how you would express that in this language you would say you'd throw an error if you possibly can now you might not be able to prove that in which case you're falling back on that but if you can if you can find that then that's how you'd find it assertions are really helpful because what they do is they give you a sort of a super type superimposed on the type system that you've already got you can be explicit it's it part of documentation excuse me as far as documentation as well as part of the reliability of your code here's an important point about this remember what if does if changes the course of the program it either goes that way or it goes that way but we're not trying to change the code the course of the program we're trying to stop it an abort means an abort so we're we're trying to synchronize these different conditions there is no synchronization here because once one of them is fired none of the other ones can because we have shut down the flow that's the entire point what we're trying to do here we're trying to make the problem worse and we're trying to make the consequences worse not better several different kinds of assertions we can either return a null object or a default objects we can throw an exception we can call a die means to actually shut down the program has anybody ever done any embedded programming okay right what goes on in embedded programming is that you will have a process running on the chip you'll have your normal program running but alongside it you'll have a process which is just checking sometimes it's just a timer which you have to kick regularly sometimes it's a separate monitor process but basically there's something there which is checking the health of the chip and if that fails and it goes now I don't trust this then what it will do is it will shut everything down I'm try and restart it so you know where your IT department says switch it off until you know again you've now got a little program doing that and some of them are more intrusive than others back in the early days of the intercity one through five that was controlled by three chips sitting in a socket each one running slightly different code produced by different different programmers and what they did do you know about this yes okay what they did was every now and then every sort of half second they would produce their results they'd all compared the results there was a piece of hardware that was there comparing the results and if one of them disagreed with the other two there's a little explosive charge which blew the chip out of its socket and the train slowed down and continued on to chips fortunately most of them are not quite that intrusive and but that's what dye means dye will mean not shut any programmer throw a crash and you know the plane dives out of the air it does mean something's gone seriously wrong so try and restart if you can and then error is entirely compile-time sometimes you can produce that sometimes you cover it if you can it won't even compile so that's assert this is one of the things that we're going to use to control our conditions it's still condition it's still fundamentally an if but its purpose is now shutting down the program or even stopping it from starting it's not to change its flow any questions at that point no the point about returning null or default is that that's supposed to hand back something which which the rest of the program can carry on with in fact we're going to be talking about that in a minute what you know what that actually means but yes that's something that program should be able to carry on with yes absolutely yes and so understanding what that assertion means and understanding what the consequences mean yes you're going to need some kind of downstream support so if you've triggered in assert something's going to happen and yes you're right you do need to log it of course you do all right doing this in existing languages it's pretty easy we can pretty much all of them with exceptions the nice thing about an exception is that next doesn't return and you can use the exception you can use the exception constructor to produce the log obviously the error needs compiler support how many people like writing compilers I do if you do then actually patching things in around the outside of the compiler particularly now we've got Roslyn makes it makes it really easy if there are things that you can look for you know features in the code then you can just link them into the compiler write them in and now you've got some custom compilers running and you'd be surprised how easy it is honestly just give it a go ok let's talk about values there is very wrong with this program now here is a factorial program written in a modern language it's a fairly trivial thing it's a recursive function anybody not understand how this works now the reason for showing you this is because there is something in every modern language there's a concept of polymorphism which means that we don't necessarily need to take the factorial of a number we can also take the factorial of a string and this is perfectly legal in pretty much every language that exists and this causes you absolutely no difficulty whatever what we're trying to do remember is to get rid of this condition there are languages like Haskell like F sharp mostly functional languages or pattern matching language or transformation languages that polymorph not merely on the type but also on the value so this is what this would like in rather Codd Haskell I mean I'm not going to show you the worst of Haskell but this is how fast go this is how Pascal would F Pascal would Express the same thing so it would create a function which takes a 1 as an argument it would create another function which takes any numbers and arguments and then it would create another function which takes a string as an argument so what you can see is that this is polymorphic on the values as well as on the types now this is important because what it does is you know what the value of polymorphism is you don't need me to tell you that because you use it all the time anyway but what this does is it takes this out and and leverages it another level this is looks like conditions but it has to be expressed at the very highest level it has to be expressed at the level of the function prototypes themselves now the consequence of that is that if we're going to follow that pattern then it means in a conventional language we have to make it look like this basically when the function starts the very first thing you do is a test and it should be a switch and it should be a switch with additive old case at the bottom so actually this should be expressed as a switch case 1 and case default but other than that you can express that pattern using conventional languages and because it's happening at the very highest and because it's a switch there is no possibility of any kind of synchronization does that make sense okay again these are very simple things to do so come on so this is what this would look like using that same using that pattern that I just showed you this is the code that we saw before which which had the problem in it this is what it looks like in this pattern this is just a tester to get it started but what we're doing here is we're putting together we're folding these up together like so based on probably morphing the value does that make sense and it means that we can just keep on adding cases as many as we want across as many values across as many types as we want and this again this would express actually using switches any questions at that point so all right so here's the recommendation you do the condition before anything else you do it with switches rather than with this you can alternatively use ordinary polymorphism you can create specialist classes I mean it here we've got an example of plus money we remember we had that the many examples before so here we're saying well actually create a special plus many class which will only construct if it's a positive number or user language which is designed for the job Mike Haskell or F sharp because they will actually help you quite a lot F sharp incidentally does this naturally and couples to C sharp simply you don't even have to think about it with anybody in my shell class yesterday okay right now we're going to come to something which is a little bit more bit more involved there is a bug in this code not exactly but it was a defect this is an example of the kind of synchronized special casing I was talking about before where it disguised you can see what we've got here it's a perfectly simple class with some functions inside it containing within it something else and we have to set that to null if we don't set it to null then if we leave that blank then this will set it to null for us this is behavior which you know about but it means that every single time we ever use that we have to test because little class is promising some interface and we can assign anything into C which will honor the little class interface except for now which we can assign to C which has no interface at all this is a null pointer the guy who invented the null pointer sorry it was Tony Hoare he put it into Algol 68 I don't know I'm looking at you he put it into Val 68 because he thought that it was simple and it was trivial and it wouldn't do very much harm he now admits that this is responsible for more bugs than any other single cause he calls it his billion dollar mistake and he is very ashamed so the guy who invented this doesn't believe in it so neither should you the problem is that what this means particularly if we leave that blank is something very odd there's no real reason why this should should even make sense if we write this down then what it's going to do is it's going to try and cast four into a little class in any way that it can but what that means is not that it's going to cast a null into a little into a little class it's going to just assign the null to C and it shouldn't if there's any logic in the world it would mean that so and if it meant that we'd never have to mess about with all of those tests because whatever was in C would always have the little class interface and so what I'm going to suggest here is that you should never ever use or allow anywhere near your program and null pointer because the null pointer is precisely a source of bugs we know it's a source of button or even know what the pattern of the source of bugs is so always always create null objects instead of null pointers and the null object can be as tribute as you want it to be and in fact they've got another talk where I explain what the theory behind these null objects would be but the fundamental thing is that they should be only enough to allow the rest of the program to work so things like dev null is an ideal example of a null object it has all the interface that our file does or other de stream does it just eats everything or it produces an end of stream right at the beginning so you can treat it as a file you can read it in the stream but it's not a stream it's a null okay so not objects exposed the interface null pointers don't have any interface at all null objects they can just be used naively like any other objects but no pointers need to be guarded on every single use which is exactly the anti pattern I'm telling you about and the null objects can be checked entirely at compile time which is when we should be doing our checking these cannot be checked at compile time they have to be checked at runtime we shouldn't be doing checking then so there is absolutely no reason to be using null objects I got no pointers there's every reason to be using the objects there's no reason to be using null pointers I think I had too much footing at lunch time ok so what I'm hoping this is going to be is yeah if we interpret that to mean that cast then it means that these simply go away and that is a much simpler program to understand any questions at that point right now this is actually the thin end of the wedge we can do a lot better than that okay we're all using object based languages now is anybody not using object languages okay every noun then somebody puts a hand up and says no I'm so unfortunate actually have one guy still in COBOL once I didn't know that anybody's still writing in COBOL yes they just don't seem to come to these gums from conferences okay right every time you invoke a method on an object there is an implied condition and that condition is that the thing that you're invoking the method on is a member of that is an instance of the class in which its installed okay now remember what I said to you right at the beginning we have these consequences based on our decisions and we have the decisions and we want to separate them so look the consequences of a decision are all contained within a class the decision itself is the factory that creates the class we already have the mechanism to achieve this separation that we want it's the object system itself it's just that we're not using it for that so let's take an example of a class which we're going to be using say in the previous example in the bigger class we can have a number of different number of different subclasses which will all implement whatever interface that is all constructed in different ways right this you're used to this you understand okay but what this means is that depending and what we've done is here is we've polymorphed again on the type of the other thing that we're constructing against but obviously you can you can polymorph on anything that you want but each of these produces a different class each of these sorry each of these produces different objects and each object is exposing whatever interface is defined in there and it can attach its own it can override anything any method that it wants so this is where you're representing these consequences so we can get up to here and we can say in one place this is the thing that we want and then hand it back from here now this is a notation which I use a lot in when I'm sketching and I've actually written pre compilers which expose this inside a language itself so I can actually I can represent this as real code what this is saying is that if we attempt to construct one of these using an integer then what it will do is it'll actually construct one of those and then hand it back as if it's one of these so what this is doing is it's automatically down casting so what we have here is we have a factory basically this is representing the factory but it's also up casting it's it's a it sits inside the little chat itself does that make sense but what we're saying is we want to construct all of these little classes and if we construct it with another then we're going to hand back an object which actually has all of this null semantics embedded in it but the owner doesn't need to know because the owner is just treating it as if it's one of these this you can do today this concentrates the entire condition at this point remember before when we're talking about the value polymorphism are saying you do it with a switch here is a switch but it's operating a much larger scale I said we want to concentrate together all of the consequences of each decision but here is precisely that concentration they're they're they're named so that you can see them clearly and they're grouped together so you can see how they relate to each other you never going to get them you know one of them going out of sync with another because you can see the whole class all at once and when the out when the user of little class is starting to use the consequences of their decisions how does it know which one it's using it doesn't it shouldn't because that is where the decision should be taken not in the clients and not over here this pattern completely gets rid of that bug making anti pattern which I showed you at the beginning yeah yes it is but in that situation assuming that we don't have that assuming everything needs to be instantiated and everything needs to be properly constructed if you don't want a null flight control system then just don't provide one what's the possibility that it won't be constructed at runtime and and if you can't create an object then you should throw an exception I actually it's a different argument but I think the only valid use for an exception is a failed constructor but if you want the object and you ask for it and it can't hand it to you then that's that needs remediation and again in real-time systems what you tend to do is everything gets constructed all your memory gets assigned right at the very beginning and then once you're flying you don't even match the heap it's just you use this tumor as if it's static okay you're happy with an answer yeah okay yes you are um I think that's what the next slide is I'm almost we come into that and but what you do is you create a factory object or a factory method okay so what we've got going on here is we're defining the interface in here and it's the base class for all of those and it's the factory for those and it's all in one place and what people sometimes ask me is why put it all in one place well why would you want to separate them it makes more sense to put them all in one place because you've got all of your decisions concentrated together it's one point where everything is focused so what does that look like using that notation for here okay so we're doing the same thing that we did before what we're defining is this money class because the money class is sorry the money class is where we're concentrating the decision here are the two different subclasses the plasmon in the - money and then here is the constructor and so again what we've done is all of those separate things which were spread out only over those two ifs they're now concentrated into one decision and you can see what the consequences are and you can you can see what the decision is and then you can see what the consequences are here and I think I mean even though it's spread out a bit on the screen a bit more I think that's actually a much clearer exposition of what was going on than the previous if you buddy think particularly when you start scaling this up and it scales up very neatly so this is the question that you're asking you static factory methods there's a problem when you're using static factory methods because they don't look like constructors so what I've learned to do is if I can't do any better then only ever use static factory methods and make all of the constructors private and that way you never get caught out so it looks a bit different than you know the very first time you address the code it's a bit surprising but that's fundamentally the pattern all reconstructs the private use methods for every use factories for everything in fact in c-sharp it's the only way to downcast like that because when you do C sharp every time you call a construct it that's what it gives you and it's not like in C++ where you can down cast you're stuck with it and you can expose the subclasses directly we saw examples of doing that earlier on in some of the some of the previous examples but you can actually just create the class that the subclass of whatever you want if you know what you're doing so you can actually bypass the factories if you if you're careful it's not so good but and then finally pre compiling I said that that pointer notation that's which notation it's very easy to compile into your code when you're doing that I mean ruslan exposes them you or you can pre compile it fairly easily there's all sorts of things that will help you to pre compilers but when you're doing that you can use the pre compiler to make sure that every single thing you know every field is properly properly initialized and nothing is left to just sitting around being now yeah and you can make sure that every time you give it a function it's explicitly cast so the pre compiles they're not very hard but they do give you a huge amount of reliability and robustness at that level so the pre compilers they protect you from your own you know from incorrectly setting up the fields and it also allows you to create these switches and it's not hard and like I say Rosslyn's available and there are pre compiler languages available as well just transformation languages which make it fairly easy okay so let's summarize we know and I don't need to persuade anybody that dirt is harmful but actually the wrong guy got the blame it was always if the only reason why Dijkstra who is a very great man was a very great man didn't say this was because he didn't have an alternative he had an alternative to go to he constructed structured programming but we were nowhere near objects in his day so he simply didn't have a recommendation to make but the logic his reasoning behind this applies equally to if okay go to causes spaghetti code which is unreliable spaghetti code means it puts two bits of code adjacent in time what ifs do is they create wormhole code wormhole code is this spooky action at a distance now I'm trying really hard everybody knows what spaghetti code is I'm trying really hard to get this meme out into the world tell everybody about wormhole code because once you once you see spaghetti code you can't unsee it once you start seeing wormholes you can't see them next point the antidote to go to was flow structure we knew this back in the 60s and 70s and it's now absolutely routine we it's unthinkable to do it any other way the antidote to the if problems is decision structure we haven't got the languages to do the decision structure now like we've got structured programming languages you know get the flow structured languages but we will do I'm absolutely convinced that in ten years time we will have languages which have this kind of thing built into them but for now come up okay structured languages work because they replace go-to with flow structured constructs that move forwards and Wiles and throws and all of those sorts of things throw is up as a more recent addition but a decision structured language it needs rather more complex things I mean these are actually reserved words but you've seen that in order for a decision structured language to make sense then it's not words its shape that makes the difference and so we've got things like a cert but value polymorphism it's in the language itself it's not a key word and these damn cast aside showed you that also is in the language itself we can fake them for now but they're not simple key words like these are but fundamentally it's the same attitude that if we can we can take the problem and we can obviate the problem by replacing the key words with the shape of the code itself okay you can do structured programming with go twos if you know what you're doing you can do structured programming with if you can do this in structured program with apps you have just got to be careful while we're waiting for the language technology to catch up so I'm just winding up now this is the thing that I asked you about half way through miss you miss using if is responsible for more bugs than everything else we misuse it when we write our program we grotesquely misuse it when we debug our programs it's really hard when you debugging a program to go back to the to the defect we don't even bother to find defects we introduce new defects but that's the wrong thing to do we've got to find some way not even a fixing the defects but I'm not putting the defects in in the first place those patterns those anti patterns they are everywhere less in fact they're so frequent that we barely even see them I'm amazed it's taken 50 years to spot this it's not like objects haven't been around for 50 years it's not even like this pathology hasn't been around for 50 years we have known about go-to and why they're so pathological for 50 years I truly don't understand why it's taken so long to catch up but it has we need the decision structured language constructs but in the meantime the only way to fix this is good programming hygiene so you know good code reviews look for them everywhere just you know a set of your code editor to make it if flash red or something because without the ifs you know you have to be judicious about it but without the F so you don't have the problem and to finish if you're in any doubt as to the value of this stuff I mean I know that I've been throwing a lot at you but if you any doubt at the value of this stuff this picture should should explain to you the reason why 90% of the primary bonus come from the one anti-pattern virtually a hundred percent of the secondary and this here is a picture of your any questions yes yeah okay let me concretize the question to make sure I've got this right and the example I was giving was printing out the money so but it selected a color and printed out either see RDR you're saying let's suppose we have we introduced the accounts no so we need to print it out as nil in black or let's suppose the account has got dollars in it yes okay and and what you're saying is that we've baked into the code the fact that this is there now if we're doing this conventionally then we would have to find in the code every single place where anybody referred to a depiction of money and we'd have to select which kind of money it was and we'd have to find the correct depiction and that would involve a change in the code doing it this way we'd have to create a subclass representing this new type of money and then we'd have to reference that subclass in that factory constructor thingy and it's still a matter of changing the code we still have to reach into the code because before hand you didn't know about this over ten grand oh you didn't know about this not what is the problem with having to change the code because we're Romans have changed have known to such a question correctly so why why the question but the factory still needs to be able to take a decision on some basis now we don't know what the basis are you just introduced one I gave you one is zero you gave me another it's over ten grand no I would say put it all into the switch because you have this one switch which says okay we want to create one of these things and this is we want to create some subtype of it because we've got a bunch of consequences based on which one it is so here is the point at which we take this decision now if it's over 10 grand then we know it's positive so that should come first and that's and that's which or in fact it should be more specific in polymorphic terms so it should be the more specific example in the switch so that should overall the fact that it's it's positive and in practice you do that by listing it first if it was a switch statement do I then want to go back I see no I would say place it right there because otherwise what you're doing is you again you're spreading out the the consequences because over here you've got is it positive but over here you've got is it greater than 10 grand and actually they seem to be part of the same choice so where they're genuinely orthogonal decisions they should be separate but where they're genuinely overlapping but I think they should be together you're comfortable with answer okay cool violating the open-closed principle because you're always having to change you can't just extend again I would ask you what you would do in a conventional circumstance we are open in as much as we can create as many new subclasses as we want to represent this more and more specialized behavior but if you have a factory then the factory needs some basis for taking a decision and if you've introduced a new basis then where do you address that I mean do you put that in the subclass I think that's the wrong place to put it I think that you know you have a piece of function there you have a locus which is precisely there to take your decisions I actually don't see any problem with going in and touching that now yes yes I think that's I think that's true and I think that if you are if you're working with a third-party library to which you don't have access and it is taking its decisions then in a conventional world you wouldn't have access to its decisions either and it may be taking decisions which is absolutely reliant upon and it doesn't want you tampering with that so I would say that if you've got something which is closed where the factory is inaccessible and you can't get it to change it then actually you have no business in there trying to fiddle with it you shouldn't even be trying to override that and in fact let me amplify that let me amplify that the example as I showed you showed the subclasses actually defined inside in the scope of the parent class where they weren't even accessible outside it was you know the client couldn't even tell what the class of the thing being handed to it was so in that sense that whole structure was closed if you don't have access to it closes how it remains and closes how it should be because otherwise you're tampering with the innards of somebody else just pretending that you're not yeah you see I think I would do that inside the compiler I would say that I think what you're asking me is well here is how people do it and here is how I'm doing it and how do we do both at the same time that's what it fundamentally boils down to now ultimately I would say choose I understand about these libraries which are extensible you you create your subclass and I have absolutely no problem with creating their subclasses that's jolly good way of extending a piece of functionality but the problem that we've got is not to extend the functionality the problem that we've got is to extend the functionality in a nonlinear manner we were trying to to take decisions potentially quite far-reaching decisions in the course of that extension so if you've got just one library or just one class which are they're trying to extend fair enough but if you've got a whole constellation of classes and they've all got to work together and you've made an extension down here and you're trying to hand it on to there then there's very very tight restrictions on what you can actually do with this this new class of yours if you're if you're handing it in there are a whole bunch of other principles like let's go for instance which limit what you can do and these things are not gifts from God these things are conventions to help us work with those sorts of sorts of architectures things like c-sharp require you to follow this curve you can't ever hide anything but things like C++ don't and so C++ has got different kinds of inheritance that you can play different games and so those sorts of uses of libraries you have to be much more disciplined if you're doing them in a language like C++ so what you've got there is two different approaches to how these libraries interact with their environment I admit I'm proposing a third and I'm not even saying that this is better than the idea of X sending a pre-existing class but what I'm saying is that if what you want to do is extend a pre-existing class by multiplying its subclasses by multiplying its down casting then you have to do that with the the acquisitions of the class itself because otherwise it's just going to lead you down into trouble because if you're giving the class a piece of functionality which is not expecting and which is not prepared for or if you're giving the client a piece of functionality which is not prepared for which you can do with those subclasses then things can break all over the place so I do think you do it this way you do it the other way once you're into this this factory method thing then you'll stuck with it and so in the case where it's already closing you can't get to it and so third-party library do what they want you to do don't vitam are you happy with those answers you're not are you i it's a question I do very much believe that I think that we can only go so far using with the programming hygiene that we've got and we can only go so far with the legacy that we've got I think that when we start getting into more fundamental language support for these things and I'm absolutely convinced that we will then I think that a lot of these questions are going to go away because they're going to be right there in the language yeah I kind of get what you're saying when you write class where you have members on that class that represent some kind of state and all construction if I mean once day and then you do things to it do something when content becomes yes okay there are several different answers to what you're asking whether it's a fair question the first thing is we are not trying to eliminate ifs of course you need to take choices what we're trying to do is eliminate syncronized ifs more to the point we're trying to eliminate synchronized states so let me give you an example of what I mean by that we've got a system which we want optionally to produce logging output so as it starts up it looks into its configuration and it finds possibly a file end of file that we want to log to and if there's no file there then we don't want to log it in if there is a file there then we open the stream to it and then at some other point we start constructing a message to send to that stream and then it's another point we say okay we've finished constructing the message now send it out now at the point at which we send it out what are we looking at are we looking at the configuration are we looking at the existence of the stream are we under the existence of the message which one of these is indicative of the thing of the task of the decision that we're trying to take here well the answer is none of them the answer is that decision has already been taken by the thing that we're sending the logging to right so those decisions that you can take ahead of time those are the things that you're taking the things that when you've got lots and lots of decisions concept that that should be behind that class boundary that's that's what that's for individual conditions I mean you know we have to take a decision at some point how are we going to log or not so it's perfectly okay to have those in the program the second thing is that actually you can create short-lived objects representing these these decisions or representing these collections of state we're not talking about having things hanging around for a long time you can create things and destroy them and create them and destroy them again in C++ you do it on the stack it's it's part of building a stack and telling you then I know that manage languages they have to churn through the heap but even then you can create short-lived objects that's the main answers are you comfortable with that okay but we're not trying to replace every if we're trying to replace the synchronization yeah nothing you can be done with both words no the point about the switch is that it's clearly telegraphed what you doing a switch it happens at the very first point of a procedure and it happens if you find yourself right in the same switch in several different procedures then at one level you've got a degree of security that you haven't got using FS because at least every if should matter switch should match but if you it takes longer to write a switch it's you know it seems to take up more space on the page so if you're starting to play silly buggers then you know you're playing silly buggers and it's up to you you could but you would spot them because if they're just peppered right the way through the code if again to go back to the bug fixing case let's suppose you found the malfunction and you found the state which is triggering the malfunction what are you going to do you're going to put an if and we're going to put a bloody great switch in there you're not going to put the switch in because they're they're heavy and they look heavy it's much easier to put the F in okay any more yeah I have to admit I haven't explored that what we found in that in a big codebase that we're having the problems with was that it was a mess it was a serious mess a lot of it had been translated first from Fortran into Delfy and then from Delphian to Delfy net and then from darvid net into c-sharp net and then bits of it started to move back again and we were getting chunks of C in Polish and it was just a nightmare this came and we were trying to find out ways in which we could refactor it wasn't worth the candle in the end what we did was we we said okay well when we start rewriting bits of it when we were writing the new code we tried to write it clean and when we had opportunities to take out the old code and replace it then we would rewrite it from scratch a lot of the time I mean we have no unit tests or anything so he didn't even know if it was supposed to be doing so they would take a guess that what it was supposed to be doing and rewrite so we actually found it easier to rewrite and to re-engineer than we did to try and refactor other than that I have nothing to offer okay anymore yeah okay getting going man I'm a big fan of functional programming having forgotten I did a workshop on a sharp yesterday and the reason why I did it is because I've been playing with functional programming for years I mean I first studied them about 30 years ago and I've been playing with Haskell but they're not really very commercial and I'd had f-sharp on in my peripheral vision for ages and we decided to try and in my brethren client we decided to try and bring them bring f-sharp ed and to see how it worked and although it was a different way of thinking I mean we found that that knowing c-sharp Nannette was not very much help but we also found that hasn't known Haskell wasn't much help either because it's it's something entirely its own and yet what we found that once we'd learn a little bit of facility with this we found that first of all we could write at ten times the rate that we could in c-sharp we could write in ten times a tenth of the space that we could in c-sharp it was running between 5 and 100 times faster than the equivalent code in c-sharp it was quite slow to read but I was absolutely amazed this is an extremely performant language and a lot of the performance comes from the fact that it's functional and a lot of the performance comes from ironically the compromise that they've made to make it fit into net so I am absolutely sold on frontin languages and particularly f-sharp and I've now found as a result of that exercise that is now my go-to language I mean if I want to do a mash-up I'll do it if I want to do something big you know things like compilers now I will do it in their shop so I am a big fan of functional languages what I notice is that functional languages are having a big effect on conventional languages a lot of the clever stuff you know the lambdas and the link stuff that's going on in c-sharp it's there because of the f-sharp project and because they needed that support in in the runtime in the eye out otherwise they couldn't have made it work at all and everybody regards those things as being universally good things inside this object-oriented procedural space I look at things like coming the Python and Python is starting to grow object orientation all concepts and then being beaten down and then growing up again and beaten down but the thing is that this idea of functionalism we have known for thirty years what the benefits are is just that it's been very difficult because you've needed to take this big step into the space to understand it and a lot of people couldn't but by tricking it into mainstream languages then it's making it much more accessible and people can take time over learning it and again I believe that in 10 to 15 years time it will normal programming would be much more functional in flavor than modes that it is now and I think that pure procedural code is going to be in the minority and it's going to be regarded as sort of almost right visual basic code was ten years ago seriously so big learn it definitely everybody learn it can't do without it you'll be ahead of the curve all right any more questions all right thanks folks coffee's down so
Info
Channel: DevWeek Events
Views: 38,334
Rating: 4.615819 out of 5
Keywords: Software Development (Industry), Software Engineering (Industry), Software Testing (Industry)
Id: z43bmaMwagI
Channel Id: undefined
Length: 86min 46sec (5206 seconds)
Published: Fri Oct 30 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.