Keynote - The Evolving Role of Data in Software Development by Martin Fowler & Scott Shaw

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Welcome to a giant community. You're probably kind of sitting alone in a room, maybe with a couple of people in an office, but you've become part of the ThoughtWorks community, a global group of more than 7,000 passionate technologists. I'm one of them. I'm Nigel Dalton. You can see I've got the badge on today. 14 countries, 43 offices, shaping the tech industry through commitment to open source tech free knowledge. Here in Australia, offices in Brisbane, Melbourne, and Sydney. That's where I'm broadcasting from here in Melbourne. And a tiny subset of that 7,000 people made this event happen today. If I have to describe who we've got up first today, they're old friends. We basically got, and you can Google these folks if you're not familiar with them. We've got the Billy Bragg and Henry Rollins of software development today. These two wrote the songs that started whole movements, that created violent shifts in the way we think about not music in this case, but technology. They've been arguing about it ever since. They've caused more revolutions than I can possibly remember. Now, Martin, renowned author, software consultant, international speaker. Delighted to have been locked down, I think, because it's the first time in decades he stayed for six months in the same time zone. Two decades of experience helping companies and organisations evolve technology for mission-critical information systems. He's one of the authors of the Agile Software Manifesto. That makes him an OG, an original gangster. And has written seven award-winning books on software development, the latest of which we're going to give a few- that work, software that doesn't overcomplicate things, you know? I never forget Martin remonstrating with a group of us, oh, it's years ago now, that why were we inventing these things when HTTP was there for us all to just enjoy? He loves collaborating with all of us fellow ThoughtWorkers discovering patterns of good design. It's got an amazing blog that you can access at any time. And he's an expert on so many things. I don't know how Scott's going to tie him down today, but Scott's the right person to do it. And Martin as a historian, he is a living treasure who was actually there in the last two decades of the revolution of the web, and we should use him, so if you've got questions for him today, I know Scott'll be keeping an eye on the chat and would drop them in there accordingly. Now time to introduce Scott is director of technology for ThoughtWorks Asia Pacific region. Scott is a musician, but also divides his time between customer-facing work, writing all sorts of incredible guides and the Bible of how people should transform their technology for so many lucky customers, and then doing the internal tech leadership of things like our radar, our tech radar, big input to that. As a leader in ThoughtWorks, as well, an elder, he's responsible for the technical brand here in Australia, ensuring we're delivering only the very best technical quality and support. He fosters our innovation practises, our delivery projects, and is one of the, a great person to have around the organisation as a really bright and leading technologist. He has the pleasure and the luck to work with ThoughtWorks CTO Rebecca Parsons, as well as his colleagues on the ThoughtWorks technical advisory board. But that's a conversation I'm not invited to as a social scientist, and I'm very glad, because it was pretty advanced in the way it thinks today. They'll share some of that today. And one product of that is our ThoughtWorks technology radar, which we've just be discussing about the time zone challenges of running the radar in the 21st century. Now, the topic today for these two, for, I don't know if that's... Is the evolving role of data in software development. We've got a big data thread throughout our day. Some remarkable, ethics of data, the engineering of data. We're going to have data scientists. We're going to have data designers. But right up front, the original gangsters of data. How does it work in software development? We've seen the types of applications and the architectural environments which we're working on, heavily reliant on data in all forms. The data's gone beyond simple problems of persistent stores, and now it's streaming operational and business events. It's massive data lakes, legacy silos, trying to be, let's face it, we never have a perfect white sheet of paper to draw out our things on. Distributed networks, the democratisation of data, which nobody ever wants to own. So it's changed the tools and platforms and it's changed the applications at the same time, and sometimes reinforcing ageless engineering practises, sometimes bringing new ones that we'll get to hear about today, 'cause my friends are going to discuss this concept and more. So I hand over to you, Scott. Lead the band from here. Let's have a Woodstock moment. I won't be Abby Hoffman and rush onto the stage at any point, but over to you two. Thanks very much. - Thank you, Nigel, and good morning, everybody. Good morning if you're in our time zone, I feel so good about myself now after that introduction, Nigel. That was great. Got a little ego boost there. When I joined ThoughtWorks 15 years ago, I had read some of Martin's books and knew that he was chief scientist. In fact, that's probably about all I knew about ThoughtWorks at the time, like a lot of people that join. Martin'll tell you he's not the chief of anybody and he doesn't do science, but we like to call him that anyway. So, but fortunately over the years, I've gotten to know him and we've worked together, as Nigel mentioned, on the radar, so twice a year we get together in a room with other, with our peers and argue and discuss and put across our views on technology, and I'm hoping to elicit some of those and share those with you today so that you can get a little view into Martin's thoughts on these things. Given that the first thing that I ever worked on with Martin a long time ago was, it was an attempt to find some statistical trends in the vast repository of code review samples that we get. And there wasn't much, as I recall. We tend to favour submissions with less code. That is one thing that we did notice. So Martin claims to not know anything about data and data engineering or data science, but I think that's a bit self-deprecating. And, but for that reason, the topic that I would like to pursue this morning is how data, the proliferation of data, the increasing use of data in our software projects has affected the way that we develop software and affected those of us who develop all kinds of software. And so, and that kind of puts us in the realm of software delivery, which is something that Martin knows a lot about. So I thought I'd start by looking back a little bit. I know that we as software engineers have kind of regarded the database as a tool that primarily serves the needs of the application, persists data for the application. And we have been sometimes very vocal about wanting to access the data through the application through some kind of anticorruption layer rather than using the database itself. There's another school of thought, and it's led to this kind of split brain phenomenon in a lot of organisations between the data people and the application people. And I think that's still around to a certain extent. And I wondered, Martin, where do you think this division arose? Some people think it's just because we as software people like to control everything, that we like to, that we want to be able to change the data whenever we want. Is there a good reason to have done that, do you think? - I honestly don't know, because it was kind of set from when I first got into the industry back in the '80s, there was this, there was application programming and there was databases. And even if you were working in a field of application programming, like in the enterprises where you actually do quite a good bit with databases, there was still this kind of separation. And so you'd run into people who spend all their time doing data-oriented applications, but don't know SQL, for instance, although they're always talking to a SQL database. And indeed, you get then, of course, things like object relational mapping frameworks, which were, I always sort of commented that you kind of treated the database like this kind of crazy old aunt that you want to lock up in an attic room somewhere and nobody talks about it. And that was the way that people talked about the database. And that seems kind of crazy to me, because if you're going to be dealing with data, you really need to know how to use it well. And then on the other side, you have people in the data world who are completely ignoring most of the concepts of application development and structuring that we got used to on the application side. So no notion of modules, no more modularity, really, than the functions. And these are not first class functional functions. These are fortran level subroutines, often. Your version control was, oh, you give each script a different number on the end to tell you which number it is. And it's really strange, as if then those two worlds just hardly ever spoke. And I thought it was a great problem. I think anybody who's working with databases needs to know how to talk to the database and needs to understand a reasonable amount about how they work and how to work with them efficiently and effectively. And it's really been a great problem, I think, that we've seen this separation. - Yeah, I think the separation's probably still with us, and now we've seen kind of that data world has now morphed into the world of data lakes and big data. And I see a lot of companies collecting data speculatively, which is something that we've kind of always, we at ThoughtWorks, and we take kind of a minimalist view of things. You only, you probably aren't going to need it. Build only the bits that you need at any given time. But there's another school of thought that we should collect all the data and have it available to us in case we might need it. Do you think there is some value in that? Have we been too hesitant to just collect things speculatively? - Yeah, I mean, I think this is, it is a significant shift that we've seen in the data world. I mean, again, going back to kind of this '80s, '90s picture, data was deliberate. You deliberately grabbed hold of data and stuck it in the database, and you'd probably be very deliberate about, and you were very deliberate about the structure of the data. Is it properly formed? Is it validated? You look at it as a very definite thing, and you're very thoughtful and deliberate about it. And now we're in this world of, as you say, speculative data or accidental data. You just grab all the data, don't care what format it's in, and just suck it in and put it somewhere. And that's why you have to deal with it differently. I mean, we call it big data, but I always liked Ken Collier, a colleague here at ThoughtWorks's point of view that big daddy is really messy data, because it's often the messiness that makes it so different. Yeah, there's a lot of it, but it's very ill formed because we haven't deliberately thought about what we wanted to grab. And it is sometimes useful to do that. It was interesting. I was just listening to the ThoughtWorks podcast that we do, and we had an old of yours on it, whose name I'm blanking on, but it will come back to me any moment now. And he was talking about how he'd been on a project and they started collecting some data about a year ago with no idea whether they were going to use it or what they'd use it for. And then a year later, it became really, really useful in helping them deal with various performance issues with their software. And that's the thing, again, the sense of if you've got data, you don't want to throw it away, but on the other hand, you've got to be careful how you use it, as well. But I think that's the really big shift. It's a shift from deliberate sort of carefully captured and carefully looked after data that we know what it is to this hoovering up, grab everything you can possibly see and figure out what you're going to do with it later. - Yeah. I remember once I was having a conversation with the head of a data warehouse group at one of our customers, and I was explaining the benefits of writing tests and of putting the attention into the design of the software and everything. And he says, "You don't understand. The data warehouse doesn't do anything." And I said, I was thinking maybe that's part of the problem. But I think that there is some value in having that. And I think that having that data available to mine is, people are actually finding some benefit from it. - Yeah, and I think, I mean, as far as I see it, that's the original concept of the data lake was it's somewhere where you just stick everything, and with the notion that the main people that are going to use it are people who are probing, looking for things that are useful, but you don't rely on it for your operational work. Once you've figured out, oh, this is really useful, then you build the proper plumbing to go directly from wherever your source is to the real applications, and you're more deliberate about it, again. But the lake is really there for the data scientists to wander around in their lab coats with their magnifying glasses like Sherlock Holmes, to mix metaphors desperately badly, and try and see what might be interesting to piece together. And the usefulness is having well-known place or places where it's together. And of course, the idea that we're trying to move more towards here at ThoughtWorks now is the data mesh. where we try and say, rather than think of one big corporate data lake, instead think about these separate areas following the lines of business, where there's a bit more understanding about where they are and the people who dive into these lakes have a bit more knowledge about that particular area that they're looking at, which is actually also something that was part of the guy who originally coined the term data lake, 'cause he talks about how he saw the idea evolving to this water garden of different areas. But the key difference being this unstructured, accidental data, speculative data, as opposed to the deliberate data that you get when you actually want to be running things efficiently. - Yeah, well, as long as we're on the topic, it would probably be remiss of us not to mention the societal impacts of that, of collecting data speculatively and hoovering up all the data you can. I know it's something we've talked about a lot, and our German colleague, Erik Doernenburg, of course says there's a German word for the practise of being selective about the data. Do you want to try it? - Datensfauzenkeit . - That's very good, very good. - Yeah, I mean, we've practised with Erik. We're pretty confident we've got it right. Yeah, we've put it on the radar, to some controversy, but you've put a German word on the radar! I said we use English words all the time. We put in an occasional foreign word and it kind of freaks everybody out. But it seemed that the translations weren't really ideal, and so we thought, let's go with the original word. And it basically says don't collect data unless you know what you're going to do with it. Which is very much the opposite of this hoovering up notion. - Yeah, and that's fortunately been legislated into law in some places, having to defend your use of the data and the need for the data. And hopefully that's becoming more commonplace, I think, because we've seen the risks of having that accidental data exposed through breaches, and it's going to happen. We know that it's going to happen. And so the less exposed you are to that, probably the safer you can be. So there's definitely a balance there between having the data that's useful and the data that's more than you actually needed presents a big security risk for your organisation. - Well, I think it's also a matter of ownership, as well. I mean, the idea that we need to think of, we own that data to some degree. People need to ask our permission if it's going to be used, and that permission has to be more than line 176 in a 10,000-word EULA. I think it does have to move towards that kind of thinking. But in order to do that, we have to get, I think, a lot more thrust towards notions of data ownership. - Yeah, and I think, and consent. I think that's an area, consent is an area that we're going to be hearing a lot more about in Australia, at least, because of open banking and open data in general and the ability of people to actually be able to revoke the consent. So it's fine to give your consent to use your data at some time, but I don't know that many organisations have built in mechanisms necessary to be able to remove that data once the consent is revoked. So that's, I think there's going to be a lot of work for us as a software engineers in trying to implement that at some point. - Yeah, tracking the provenance of data and being able to deal with it and think about, and what does it mean to remove it, right? 'Cause, I mean, particularly if you're doing things like event sourcing, where the whole notion is you keep changes and you don't destroy things because, as we know, destroying things leads to its own difficulties. What does it mean to get rid of something? I mean, are you going to go back to all those backup discs you've got lying around offline somewhere and try trawling through them for the data? Well, there's definitely, I think, interesting questions about how to deal with that provenance in this kind of situation. - Yeah, yeah. I should mention, yeah, please send your question if, and I will try to keep an eye on the Slack channel for those questions, but Nigel, feel free to break in and introduce questions at any point if you'd like to. Another organisational issue that we've seen is this idea of as machine learning becomes more and more common and we, and organisations start hiring data scientists, I've noticed that they're creating machine learning groups, they're creating data science groups, and they sort of work in a vacuum, and they're given problems that people think might be suited to that sort of a solution, but they aren't really getting mainstream exposure to customer problems. And this is kind of a pattern that we've seen before, this idea of a new technology comes along, so we create a group that is responsible for that new technology. Meanwhile, we keep cranking out software that's pretty much ignorant of that. Is there any lessons from the past that we can learn? How should we be treating data scientists, machine learning? - Yeah, well, I mean, you put your finger right on it, right? I mean, we've got into so much difficulty by taking some specialised skill and locking people in a cupboard and treating them like mushrooms, and the way we, for all of this across the whole history of sort of, well, certainly ThoughtWorks, but, and certainly much of our colleagues even beyond that, is break these barriers down, get people working closely to each other. I've been trying to work with one of our leading data science people who wants, he's very focused on trying to get more data science people and programmers working together so that they build well-structured software, because a lot of data scientists don't really know how to structure software, and as a result, they can deal with some of the skills that come from the software world as part of this structuring and modularity and head of composing, knowing when to shift from the executable workbooks, the Jupyters and markdowns of this world, into something that can actually be maintained in the long term. And then, of course, the ever-present barrier between software people and business people. I was very struck by a quote, which I ought to have had available right at my fingertips, but I don't. It was from Nate Silver, the guy who does the election forecasting in the US. A very, very good job at election forecasting, in my view. And he commented that to be a good data scientist, about 44% of the skill required was a feel for data and what it looked like, and 44% was domain knowledge of the domain you're looking at, and the remaining 2% or whatever was fancy data science skills and knowing which specific technique to use and all that kind of thing. And that struck me very sharply because that echoes sort of from my slightly outsider view. I mean, yeah, you've got to have a feel for how to look at data and a nose that looks at these things, but also what's really crucially important is that domain knowledge, and it takes a lot to learn about the domain and get a feel for what makes sense, what kinds of problems are important in a domain. And typically the best way to do that, unless you've got a rare animal that is both, is you have to get them to collaborate, which we humans are fairly good at doing as long as you can actually not put up the structural, the organisational barriers against them. - Yeah, I entered the field of IT kind of from doing a lot of research in areas that were much more statistical and modelling and pattern recognition. And one of the things I noticed when I entered the IT world was this kind of astounding lack of literacy in probability and statistics, and the people's sort of lack, the flaws in their intuitive thinking about data. And I wonder, is that just something data scientists need to understand? Is this something maybe all software engineers need to get better at? - Well, I think it's more than just all software engineers. It's all people. I mean, I think society in general suffers greatly from people not understanding probability and probabilistic outcomes. It's actually one of the reasons I really enjoy listening to Nate Silver's work on 538 and his election modelling is because what they're trying to do is explain probabilistic outcomes to an audience that doesn't really get probabilities. And they're constantly having to battle with how do we convey this, both in talking about it and in the visualisations they do on their website, and we have that same struggle whenever we talk about this kind of stuff, because so many people don't have that background. I wonder whether I benefited greatly because I'm so interested in tabletop gaming and was as a kid. So if you're going to do these big hex encounter of war games that I played when I was 13 or 14, you get used to the idea that everything that goes on is probabilistic, and while you can't know for certain what's going to happen, what you've got to do is know probabilities and try and maximise the probabilities in your favour. And I suspect that helped me a great deal. I've heard some people say the same thing about playing poker, because in order to do well at poker, you've got to have an understanding of the probabilities. And I think, I've said that's true generally. I think it is also true for programmers, as well. And one of my challenges really is thinking about how to make that kind of thing work, how to get, to what extent do more software developers need to know about this kind of stuff, how to pass that on, and also how to then to influence the people around them, in particular, the consumers of the information, because they also need to get it. I mean, it's all very well saying we're going to run the businesses in a much more data-driven or data-informed way, but so many people just don't understand how to look at data. I mean, I see senior business meetings where they're trying to make this, where they claim they're making decisions based on the data, and they take two populations and compare them just by looking at the averages with no idea what the actual data distribution is in the two populations, so whether it makes any sense at all to compare averages. You understand that. This is . But so many people don't. - Yeah, yeah. I think it's a society as a whole. We saw that at the election four years ago, with the misunderstanding of what probabilities mean, the US election. - Yeah. - I'd want to shift the topic a little bit to how we work, the software engineering practises that we use when we're working with data. I think one aspect of, we're incorporating machine learning models, learned models, even just linear regression, into our software systems a lot more these days. And there are some interesting characteristics that are quite different, I think, from the way that we traditionally think about, say, requirements for software. It's deterministic. I mean, we are actually building systems a lot that are much more non-deterministic than they used to be. We used to consider non-determinism a bug or a race condition or something like that. Now, the models that we're building, actually, we can't always predict precisely the answer they're going to come up with. Is there, how do we deal with this? Are there ways that we can still ensure the quality and correctness of our software, even when the specific answers are probabilistic? - I mean, that is, I mean, it's not an area I've spent a lot of time really looking at, but it is an area that interests me in the sense that I kind of feel that we have to kind of move into a world of, much more a world where we're debugging the software. I mean, if you've got a machine learning model, you've almost got to take an assumption of, this has got bugs. My job is to try and figure out what those bugs are, particularly the ones that are going to bite us hard, and try and find them before they actually do surface. And we see where we get machine learning models that we can have, throw up correlations are really bad, like sending insurance quotes and you find, oh, the people who are mostly getting the expensive insurance quotes happen to be people of colour. That's not a good thing to bring those kinds of biases in, but it's coming in from other places. We know these kinds of biases are very deeply right in our system, so it's natural they're going to show up in machine learning. So there's a role, an important role for software, in the software development area, whichever sort of branch, part of it, activity that you want to focus on, that says yeah, we've got to try and debug these things and think about trying to surface the problems. And then once we've surfaced them, we can think about what we might be able to do to mitigate them. - Yeah, I think there's a lot more we have to do in testing the limits of our models. And bias is one of those things, obviously, that we need to start testing for. There's a few tools out there. I know Lime is one that allows you to look at different facets of the data and understand what happens if I use just this one subset of data to train with, or what happens if I remove a particular subset of data, does that change the outputs? And so I think that, I mean, we're starting to... One of the problems, I think, with these learned models is that they're opaque. It's really hard to understand how a decision was made, and yet life-changing decisions are sometimes made by these models. Do you think we're getting to the point where we can explain models at all? Can we explain why these decisions are made? - I don't know, but I agree with you completely that explainability is, I think, one of the greatest challenges and something that is we have to demand as citizens. I mean, it's one thing to say, oh my algorithm, I don't know why it is, as soon as you buy a kettle on Amazon, I get bombarded with ads for more kettles wherever I go on the web. I mean, that's not, you know, it's irritating, and it kind of makes you very cynical about AI, but it doesn't actually do any harm. But many people, their decisions are the case. I mean, there was the situation in the UK recently where there was adjustments made to people's university placement exams. I mean, that's a huge life-changing decision. Whether you go to this university or that university can make a big difference. And the reaction was, well, the algorithm did it, and that's not going to be an acceptable answer, and it shouldn't be an acceptable answer for many things that it currently is treated as unacceptable answer. And I think that's the underlyingly, one of the big problems around using machine learning. People are really kind of jumping on top of the machine learning, and yet really, is it better to use those than some of the more formal, not formal, but more analytical statistical techniques? Is machine learning really going to do you better than a simple piece of linear regression that not just gives you some answers that are useful, but also coming up with the model gives you insight as to what's going on, and that insight is probably just as useful as the answers that you're getting. With the machine learning, you're just kind of throwing it in a black box and getting an answer out, which is not bad, but for many things, we really, I think, need to do more investigation. And that argues, I think, for much more deliberate modelling. I'm using that deliberate term again. It's obviously my term of the day. - Yeah, I think the, I mean, it all comes down to the objective function that you're using to develop the model with, right? If it's simply maximising revenue, then there's probably all kinds of dark patterns you could encode in your model that are going to allow you to do that. But if we take into account, I know there's a, spoiler alert. I'm probably going to put this on the radar. Finally we've got a library now that implements differential privacy so that you can actually understand, you can use that privacy preserving qualities in the objective function in training your model. So I think that's something that I hope we see a lot more of, and it kind of brings up, I think testing in the world of machine learning is quite different, and we probably need some more tools around that, I think, to be able to... I know I've read a paper in IEEE Software recently that talked about how the requirements, they did a study of machine learning-based software and found that requirements come in a much different form. We're used to getting requirements for software in the form of examples or business rules, repeatable business rules. Now people are getting requirements in the form of statistical qualities or general ranges of values. They're more quantitative requirements. And so I think that... Do you think that changes how we're testing, and do we need a different way of doing that? - Yeah, well it should do, right? I mean, I think it's an interesting thing when we start talking about outcomes in terms of, well, I'm looking for an outcome that sort of improves this probability distribution compared to what it was previously. A lot more fuzzier things. I think it also, I mean, when it goes back to the debugging side that I talk about later on. I actually think one of the most useful things for many of these machine learning things will be using it as a tool to try and understand what's going on. So when you get a machine learning algorithm that does very well at something, then saying, well, how is it doing that? What is it looking at? Can we kind of dig in to try and get the understanding of what's happening inside this black box in there so that we can take it out of the black box and move it into the analytical world and perhaps improve its performance even more because we've now got the analytics. So I suspect there could well be some interesting back and forth between the machine learnings world and the more analytic world where you'll be constantly bouncing back and forth to try and better understand what's going on and then feed that through into a more interesting model. But I'm saying this, again, from an outsider. I don't do this stuff. So, but I do talk to those of my colleagues that do, and they haven't told me I'm an idiot yet, I guess. - Well, one thing I know you do do is refactor, and you recently updated your classic book "Refactoring" to work with JavaScript, amongst other things. And I wonder how does refactoring work in the world of data? I know those data specialists we talked about at the start don't like to change their data models very much. They don't really like to change the schemas. Is there a place for refactoring in the data world? - Well, as we know, there most certainly is, and that was particularly the work of another colleague of ours, Pramod, who together with Scott Ambler wrote the book on database refactoring a long time ago. But sadly, one of my greatest disappointments is that I created my signature series with Addison-Wesley hoping that it would sort of raise the visibility of books published in the series and make the ideas more common, which it might've done with things like continuous delivery, which was in the series. But sadly, it didn't in database refactoring. It didn't get as much traction as it should've. I mean, actually refactoring databases, it's harder than refactoring code because you have to migrate data as well as refactor the schemas, but it's certainly doable and we've been doing it for decades, right? Here at ThoughtWorks, that's the way we deal with databases, and has been for a long time, but it's still, from what I gather, not well known enough across the industry how you do this, which is a really great shame, because if the database is such a central thing that if you can't evolve that, you can't have an evolvable system. And so you have to figure out, it's not hard to do, but you have to make the effort to learn how to do the database refactoring and work with that. Now, when you're talking about huge amounts of data that you have in speculative data acquisition, then that's a whole different ball game. But again, of course, the whole thing about speculative data acquisition is you're just grabbing it in whatever form it is. It's when you're making this more deliberate structure to work with. But then you have to remember that that deliberate structure has to be an evolvable form, not a set in concrete form. - Yeah, one thing that I frequently get asked about is transitioning to microservices architectures and how do we decompose monoliths? This is a problem that we've encountered over and over again. And I always tell people, you know about this book? It's called "Refactoring Databases" by Pramod Sadalage. And I'm surprised how many, that it's not that well known. But I tell people that's where you start. This is where the problem exists with models is the heavy dependence on stored procedures in the database and the necessity of being able to split that database up across multiple services is probably a big one. I think it's time we answer a couple questions from the audience. - Yeah. - Nigel just gave me one from Julia Neil. Hi, Jules. "People are probably bad at understanding probability. I think that as systems engineers and data scientists, we should make software that can shoulder this burden, not necessarily expect people to understand probability. Do you think we need to do a better job of writing software that helps people with that?" - That will certainly help. And that's one of the things that interests me with things like 538's work, 'cause they're trying to make probability more of a visual thing. I mean, you also see it in some of the stuff around COVID-19 where people are similarly trying to explain forecasts and the like in a more visual manner. It's not the visualisation, the explainability of the probabilities. But having said that, I also do think that people need to learn more about vulnerability, and I don't think it's, we should treat it as something that people can't learn. I mean, let's not forget, it wasn't that many hundred years ago that long division was considered to be an extremely advanced concept that you would only learn about in university, and now we take long division for granted. So I think if we could learn how to do long division more widely across the population, we can learn about understanding probability more widely. Maybe we get more people to play tabletop games. Maybe that's the answer. - Yeah. I'm just looking at the... - Yeah? - Here's a good question. Andre says, "I feel like the word itself refactoring is hard for common people to understand. Is it rebuilding? Is it retrofitting? Do we have, would you like to comment on the interpretation of the word refactoring and it's usage? - Yeah, well, refactoring is a fairly precise technical term I wouldn't expect someone outside the software world to understand what it means, just as I wouldn't understand the technical term, I don't understand what four-by-12s are, which Cindy and her friends in the building trade talk about all the time. I mean, it's a technical term, but I can still explain it. And I always, when I'm trying to explain to somebody outside of software, the way I try to look at this, I say, well, if you've got an existing software system and you need to make a change, one way of making that change is you kind of slap something on top of the existing software that kind of makes it twist the way it goes. Kind of like sticking a patch on something that's broken. And then the problem, then, is that as you make more and more changes, you get patch on patch on patch on patch on patch, and the whole thing becomes this very complicated and very precarious structure. With refactoring, what you're doing instead is you have to go inside the existing software and build it to make it perfect so that the new thing will just swap in exactly without being kind of patched on top. Now, doing that change on the existing spectrum of software, obviously you're doing some kind of fairly invasive surgery, so you need to do it in a very disciplined manner. And that's what refactoring is. It's a disciplined manner to change the shape of the software so that the new capability can just slot in. So you change it to make it look as if it was originally designed for that new feature coming. That's the term Walt Cunningham uses. You try and you get that impression. And if you do that, then the software doesn't end up then being this patch on patch massive complexity. It can evolve steadily to grow. But you do have to understand that discipline of how to do it. And I don't think refactoring is necessarily very hard to learn how to do, but you need to learn how to do it and be able to be disciplined to do it well. And that's a tricky thing to do if you've not got somebody who's able to teach you. You've got to have that determination to do it. And there are a number of other skills in the software world that are like this. Working with test-driven development is a similar kind of thing. You have to kind of learn the scale, and then once you've learned it, it's actually fairly straightforward, and you learn when to use it and when not to use it. Refactoring is the same thing. The great example of refactoring is that when you're able to refactor effectively, you can make changes to software without getting yourself into a panic and without causing a lot of problems. And that can be worth it just on its own because it's so much more calm process when you're able to do that. - Yeah, I find a lot of people use the word refactoring when they're really rebuilding. They just don't want to tell their boss they're- - Right. Yeah. I mean, And I use the general term restructuring just any form of it, though refactoring is very precise. Do very small steps, none of which gains the observable behaviour in software. That's the discipline that allows you to do that. And then each snap is too tiny to be worth doing, but you can spring you can compose the music, you string them together, and you never break the software in the process. - Yeah, I kind of like Kent Beck's, he calls it tidying and kind of, I think we probably, we need to start wrapping up, but along these lines, I want to make sure I ask, do you think refactoring, is it something we have a choice in? Is it something we do because we're told to, or is it something we do because we are expected and then we have a responsibility to as software engineers? - I've always argued that we have a responsibility as professionals to do this. I mean, what we're being paid for is to build software that can produce, that can be, where the stream of new features can be implemented as cheaply and as quickly as possible. I'm assuming that the software is not dead and we've been shuffled alongside, which is very, relatively rarely the case. We're being paid to work on this software so we can continue doing things to it. The cheapest way to keep a software evolvable is to keep it in a good condition. Keep the cruft out. And a technique that's really vital in keeping cruft out of software is refactoring. I mean, there are other techniques that play a role in this continuous integration. Having self-testing code. These are important things, as well. But refactoring as a key technique is part of that. That should be, it's our professional responsibility. Just as doctors wash their hands before they start diving, cutting into you, we have to do that. And it's not a question of whether we talk about it because people outside of our profession can't understand the value of that. That's why they pay us. They pay us to understand what good practise is, and we should then do that good practise. And I think refactoring is one of those good practises that we should do. - Thank you. I think we need to wrap it up now. It's been, we could probably go on like this all day. I appreciate people listening in on our conversation and I hope you got some nuggets out of that. So thanks, everyone. And I will hand back to Nigel now. We'll try and answer questions on Slack, by the way, the ones we . - Yeah, that's what I want to encourage. There's a Slack channel. It's an active conversation. There's a record there forever joining. And it's all sorts of times all over the world. I just want to thank Martin. Amazing. It's so good to have the original gangster of refactoring and so many things in the room, and Scott, masterly contribution to the conversation today. And I wish Martin had had the chance to ask you some curly questions, as well, but maybe that's the next ex-con. So thank you so much.
Info
Channel: Thoughtworks
Views: 1,836
Rating: 5 out of 5
Keywords: Thoughtworks, Technology, Business, IT, Consulting, Programming
Id: jXq6SW3Bv7c
Channel Id: undefined
Length: 46min 39sec (2799 seconds)
Published: Mon Sep 28 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.