- Welcome to a giant community. You're probably kind of
sitting alone in a room, maybe with a couple of
people in an office, but you've become part of
the ThoughtWorks community, a global group of more than
7,000 passionate technologists. I'm one of them. I'm Nigel Dalton. You can see I've got the badge on today. 14 countries, 43 offices, shaping the tech industry
through commitment to open source tech free knowledge. Here in Australia, offices in Brisbane,
Melbourne, and Sydney. That's where I'm broadcasting
from here in Melbourne. And a tiny subset of that 7,000 people made this event happen today. If I have to describe who
we've got up first today, they're old friends. We basically got, and you can Google these folks if you're not familiar with them. We've got the Billy
Bragg and Henry Rollins of software development today. These two wrote the songs
that started whole movements, that created violent shifts
in the way we think about not music in this case, but technology. They've been arguing about it ever since. They've caused more revolutions than I can possibly remember. Now, Martin, renowned author, software consultant,
international speaker. Delighted to have been
locked down, I think, because it's the first time in decades he stayed for six months
in the same time zone. Two decades of experience
helping companies and organisations evolve technology for mission-critical information systems. He's one of the authors of
the Agile Software Manifesto. That makes him an OG,
an original gangster. And has written seven award-winning books on software development, the latest of which we're
going to give a few- that work, software that doesn't
overcomplicate things, you know? I never forget Martin
remonstrating with a group of us, oh, it's years ago now, that why were we inventing these things when HTTP was there for
us all to just enjoy? He loves collaborating with
all of us fellow ThoughtWorkers discovering patterns of good design. It's got an amazing blog that
you can access at any time. And he's an expert on so many things. I don't know how Scott's
going to tie him down today, but Scott's the right person to do it. And Martin as a historian, he is a living treasure
who was actually there in the last two decades of
the revolution of the web, and we should use him, so if you've got questions for him today, I know Scott'll be
keeping an eye on the chat and would drop them in there accordingly. Now time to introduce Scott
is director of technology for ThoughtWorks Asia Pacific region. Scott is a musician, but also divides his time
between customer-facing work, writing all sorts of incredible guides and the Bible of how
people should transform their technology for so
many lucky customers, and then doing the
internal tech leadership of things like our radar, our tech radar, big input to that. As a leader in ThoughtWorks,
as well, an elder, he's responsible for the
technical brand here in Australia, ensuring we're delivering
only the very best technical quality and support. He fosters our innovation practises, our delivery projects, and is one of the, a great person to have
around the organisation as a really bright and
leading technologist. He has the pleasure and the luck to work with
ThoughtWorks CTO Rebecca Parsons, as well as his colleagues on the ThoughtWorks
technical advisory board. But that's a conversation
I'm not invited to as a social scientist, and I'm very glad, because it was pretty advanced
in the way it thinks today. They'll share some of that today. And one product of that is our ThoughtWorks technology radar, which we've just be discussing about the time zone challenges
of running the radar in the 21st century. Now, the topic today for these two, for, I don't know if that's... Is the evolving role of data
in software development. We've got a big data
thread throughout our day. Some remarkable, ethics of
data, the engineering of data. We're going to have data scientists. We're going to have data designers. But right up front, the original gangsters of data. How does it work in software development? We've seen the types of applications and the architectural environments
which we're working on, heavily reliant on data in all forms. The data's gone beyond simple
problems of persistent stores, and now it's streaming
operational and business events. It's massive data lakes, legacy silos, trying to be, let's face it, we never have a perfect
white sheet of paper to draw out our things on. Distributed networks, the
democratisation of data, which nobody ever wants to own. So it's changed the tools and platforms and it's changed the
applications at the same time, and sometimes reinforcing
ageless engineering practises, sometimes bringing new ones that we'll get to hear about today, 'cause my friends are going to
discuss this concept and more. So I hand over to you, Scott. Lead the band from here. Let's have a Woodstock moment. I won't be Abby Hoffman and rush onto the stage at any point, but over to you two. Thanks very much. - Thank you, Nigel, and good morning, everybody. Good morning if you're in our time zone, I feel so good about myself now after that introduction, Nigel. That was great. Got a little ego boost there. When I joined ThoughtWorks 15 years ago, I had read some of Martin's books and knew that he was chief scientist. In fact, that's probably about all I knew about ThoughtWorks at the time, like a lot of people that join. Martin'll tell you he's
not the chief of anybody and he doesn't do science, but we like to call him that anyway. So, but fortunately over the years, I've gotten to know him
and we've worked together, as Nigel mentioned, on the radar, so twice a year we get
together in a room with other, with our peers and argue and discuss and put across our views on technology, and I'm hoping to elicit some of those and share those with you
today so that you can get a little view into Martin's
thoughts on these things. Given that the first thing
that I ever worked on with Martin a long time ago was, it was an attempt to find
some statistical trends in the vast repository of code
review samples that we get. And there wasn't much, as I recall. We tend to favour
submissions with less code. That is one thing that we did notice. So Martin claims to not
know anything about data and data engineering or data science, but I think that's a bit self-deprecating. And, but for that reason, the topic that I would
like to pursue this morning is how data, the proliferation of data, the increasing use of data
in our software projects has affected the way
that we develop software and affected those of us who
develop all kinds of software. And so, and that kind
of puts us in the realm of software delivery, which is something that
Martin knows a lot about. So I thought I'd start by
looking back a little bit. I know that we as software
engineers have kind of regarded the database as a tool that primarily serves the
needs of the application, persists data for the application. And we have been sometimes
very vocal about wanting to access the data through the application through some kind of anticorruption layer rather than using the database itself. There's another school of thought, and it's led to this kind
of split brain phenomenon in a lot of organisations
between the data people and the application people. And I think that's still
around to a certain extent. And I wondered, Martin, where do you think this division arose? Some people think it's just because we as software people like
to control everything, that we like to, that we want to be able to
change the data whenever we want. Is there a good reason to
have done that, do you think? - I honestly don't know, because it was kind of
set from when I first got into the industry back in the '80s, there was this, there was application programming
and there was databases. And even if you were working in a field of application programming, like in the enterprises
where you actually do quite a good bit with databases, there was still this kind of separation. And so you'd run into people
who spend all their time doing data-oriented applications, but don't know SQL, for instance, although they're always
talking to a SQL database. And indeed, you get then, of course, things like object relational
mapping frameworks, which were, I always sort of commented that you kind of treated the database like this kind of crazy old aunt
that you want to lock up in an attic room somewhere and nobody talks about it. And that was the way that people
talked about the database. And that seems kind of crazy to me, because if you're going
to be dealing with data, you really need to know
how to use it well. And then on the other side, you have people in the data world who are completely ignoring most of the concepts of application development and structuring that we got
used to on the application side. So no notion of modules, no more modularity,
really, than the functions. And these are not first
class functional functions. These are fortran level
subroutines, often. Your version control was, oh, you give each script a
different number on the end to tell you which number it is. And it's really strange, as if then those two worlds
just hardly ever spoke. And I thought it was a great problem. I think anybody who's working
with databases needs to know how to talk to the database and needs to understand
a reasonable amount about how they work and how to work with them
efficiently and effectively. And it's really been a
great problem, I think, that we've seen this separation. - Yeah, I think the separation's
probably still with us, and now we've seen kind of that data world has now morphed into the world
of data lakes and big data. And I see a lot of companies collecting data speculatively, which is something that
we've kind of always, we at ThoughtWorks, and we take kind of a
minimalist view of things. You only, you probably
aren't going to need it. Build only the bits that
you need at any given time. But there's another school of thought that we should collect all the data and have it available to us
in case we might need it. Do you think there is some value in that? Have we been too hesitant to just collect things speculatively? - Yeah, I mean, I think this is, it is a significant shift that
we've seen in the data world. I mean, again, going back to kind of this '80s, '90s picture, data was deliberate. You deliberately grabbed hold of data and stuck it in the database, and you'd probably be
very deliberate about, and you were very deliberate about the structure of the data. Is it properly formed? Is it validated? You look at it as a very definite thing, and you're very thoughtful
and deliberate about it. And now we're in this world of, as you say, speculative
data or accidental data. You just grab all the data, don't care what format it's in, and just suck it in and put it somewhere. And that's why you have to
deal with it differently. I mean, we call it big data, but I always liked Ken Collier, a colleague here at
ThoughtWorks's point of view that big daddy is really messy data, because it's often the messiness
that makes it so different. Yeah, there's a lot of it, but it's very ill formed
because we haven't deliberately thought about what we wanted to grab. And it is sometimes useful to do that. It was interesting. I was just listening to
the ThoughtWorks podcast that we do, and we had an old of yours on it, whose name I'm blanking on, but it will come back
to me any moment now. And he was talking about
how he'd been on a project and they started collecting
some data about a year ago with no idea whether
they were going to use it or what they'd use it for. And then a year later, it became really, really
useful in helping them deal with various performance
issues with their software. And that's the thing, again, the sense of if you've got data, you don't want to throw it away, but on the other hand, you've got to be careful
how you use it, as well. But I think that's the really big shift. It's a shift from deliberate
sort of carefully captured and carefully looked after
data that we know what it is to this hoovering up, grab everything you can possibly see and figure out what you're
going to do with it later. - Yeah. I remember once I was
having a conversation with the head of a data warehouse group at one of our customers, and I was explaining the
benefits of writing tests and of putting the
attention into the design of the software and everything. And he says, "You don't understand. The data warehouse doesn't do anything." And I said, I was thinking maybe
that's part of the problem. But I think that there is
some value in having that. And I think that having that
data available to mine is, people are actually finding
some benefit from it. - Yeah, and I think, I
mean, as far as I see it, that's the original concept
of the data lake was it's somewhere where you
just stick everything, and with the notion that the main people that are going to use it
are people who are probing, looking for things that are useful, but you don't rely on it
for your operational work. Once you've figured out,
oh, this is really useful, then you build the proper
plumbing to go directly from wherever your source
is to the real applications, and you're more deliberate
about it, again. But the lake is really there
for the data scientists to wander around in their lab coats with their magnifying
glasses like Sherlock Holmes, to mix metaphors desperately badly, and try and see what might be
interesting to piece together. And the usefulness is
having well-known place or places where it's together. And of course, the idea that we're trying
to move more towards here at ThoughtWorks now is the data mesh. where we try and say, rather than think of one
big corporate data lake, instead think about these separate areas following the lines of business, where there's a bit more
understanding about where they are and the people who dive into these lakes have a bit more knowledge
about that particular area that they're looking at, which is actually also
something that was part of the guy who originally
coined the term data lake, 'cause he talks about how
he saw the idea evolving to this water garden of different areas. But the key difference
being this unstructured, accidental data, speculative data, as opposed to the
deliberate data that you get when you actually want to be
running things efficiently. - Yeah, well, as long
as we're on the topic, it would probably be
remiss of us not to mention the societal impacts of that, of collecting data speculatively and hoovering up all the data you can. I know it's something
we've talked about a lot, and our German colleague,
Erik Doernenburg, of course says there's a German word for the practise of being
selective about the data. Do you want to try it? - Datensfauzenkeit . - That's very good, very good. - Yeah, I mean, we've practised with Erik. We're pretty confident we've got it right. Yeah, we've put it on the radar, to some controversy, but you've put a German
word on the radar! I said we use English words all the time. We put in an occasional foreign word and it kind of freaks everybody out. But it seemed that the
translations weren't really ideal, and so we thought, let's
go with the original word. And it basically says don't collect data unless you know what
you're going to do with it. Which is very much the opposite
of this hoovering up notion. - Yeah, and that's fortunately
been legislated into law in some places, having to defend your use of the data and the need for the data. And hopefully that's
becoming more commonplace, I think, because we've seen the risks of having that accidental
data exposed through breaches, and it's going to happen. We know that it's going to happen. And so the less exposed you are to that, probably the safer you can be. So there's definitely a balance there between having the data that's useful and the data that's more
than you actually needed presents a big security
risk for your organisation. - Well, I think it's also a
matter of ownership, as well. I mean, the idea that we need to think of, we own that data to some degree. People need to ask our permission
if it's going to be used, and that permission has
to be more than line 176 in a 10,000-word EULA. I think it does have to move
towards that kind of thinking. But in order to do that, we have to get, I think, a lot more thrust towards
notions of data ownership. - Yeah, and I think, and consent. I think that's an area, consent is an area that
we're going to be hearing a lot more about in Australia, at least, because of open banking
and open data in general and the ability of people
to actually be able to revoke the consent. So it's fine to give your consent to use your data at some time, but I don't know that many
organisations have built in mechanisms necessary to be
able to remove that data once the consent is revoked. So that's, I think there's
going to be a lot of work for us as a software engineers in trying to implement that at some point. - Yeah, tracking the provenance of data and being able to deal
with it and think about, and what does it mean to remove it, right? 'Cause, I mean, particularly if you're doing
things like event sourcing, where the whole notion is you keep changes and you don't destroy
things because, as we know, destroying things leads
to its own difficulties. What does it mean to get rid of something? I mean, are you going to go
back to all those backup discs you've got lying around offline somewhere and try trawling through
them for the data? Well, there's definitely, I think, interesting questions about how
to deal with that provenance in this kind of situation. - Yeah, yeah. I should mention, yeah, please send your question if, and I will try to keep an
eye on the Slack channel for those questions, but Nigel, feel free to break in and introduce questions at
any point if you'd like to. Another organisational issue
that we've seen is this idea of as machine learning becomes
more and more common and we, and organisations start
hiring data scientists, I've noticed that they're
creating machine learning groups, they're creating data science groups, and they sort of work in a vacuum, and they're given problems that
people think might be suited to that sort of a solution, but they aren't really
getting mainstream exposure to customer problems. And this is kind of a pattern
that we've seen before, this idea of a new technology comes along, so we create a group that is responsible for that new technology. Meanwhile, we keep cranking out software that's pretty much ignorant of that. Is there any lessons from
the past that we can learn? How should we be treating data
scientists, machine learning? - Yeah, well, I mean, you put your finger right on it, right? I mean, we've got into so much difficulty by taking some specialised skill and locking people in a cupboard and treating them like mushrooms, and the way we, for all of this across the
whole history of sort of, well, certainly ThoughtWorks, but, and certainly much of our
colleagues even beyond that, is break these barriers down, get people working closely to each other. I've been trying to work with one of our leading data
science people who wants, he's very focused on trying to
get more data science people and programmers working
together so that they build well-structured software, because a lot of data
scientists don't really know how to structure software, and as a result, they can deal with some of the skills that come from the software world as part of this structuring and modularity and head of composing, knowing when to shift from
the executable workbooks, the Jupyters and markdowns of this world, into something that can
actually be maintained in the long term. And then, of course, the ever-present barrier
between software people and business people. I was very struck by a quote, which I ought to have had
available right at my fingertips, but I don't. It was from Nate Silver, the guy who does the election
forecasting in the US. A very, very good job at
election forecasting, in my view. And he commented that to
be a good data scientist, about 44% of the skill
required was a feel for data and what it looked like, and 44% was domain knowledge of the domain you're looking at, and the remaining 2% or whatever was fancy data science skills and knowing which
specific technique to use and all that kind of thing. And that struck me very
sharply because that echoes sort of from my slightly outsider view. I mean, yeah, you've
got to have a feel for how to look at data and a nose that looks at these things, but also what's really crucially important is that domain knowledge, and it takes a lot to
learn about the domain and get a feel for what makes sense, what kinds of problems
are important in a domain. And typically the best way to do that, unless you've got a rare
animal that is both, is you have to get them to collaborate, which we humans are fairly good at doing as long as you can actually
not put up the structural, the organisational barriers against them. - Yeah, I entered the field of IT kind of from doing a
lot of research in areas that were much more statistical and modelling and pattern recognition. And one of the things I noticed
when I entered the IT world was this kind of
astounding lack of literacy in probability and statistics, and the people's sort of lack, the flaws in their intuitive
thinking about data. And I wonder, is that just something data
scientists need to understand? Is this something maybe
all software engineers need to get better at? - Well, I think it's more than
just all software engineers. It's all people. I mean, I think society
in general suffers greatly from people not understanding probability and probabilistic outcomes. It's actually one of the
reasons I really enjoy listening to Nate Silver's work on 538 and his election modelling is because what they're trying to do is explain probabilistic
outcomes to an audience that doesn't really get probabilities. And they're constantly having to battle with how do we convey this, both in talking about it and in the visualisations
they do on their website, and we have that same struggle
whenever we talk about this kind of stuff, because so many people
don't have that background. I wonder whether I benefited greatly because I'm so interested
in tabletop gaming and was as a kid. So if you're going to do these
big hex encounter of war games that I played when I was 13 or 14, you get used to the idea that everything that goes on is probabilistic, and while you can't know for
certain what's going to happen, what you've got to do
is know probabilities and try and maximise the
probabilities in your favour. And I suspect that helped me a great deal. I've heard some people say the same thing about playing poker, because in order to do well at poker, you've got to have an
understanding of the probabilities. And I think, I've said
that's true generally. I think it is also true
for programmers, as well. And one of my challenges
really is thinking about how to make that kind of thing work, how to get, to what extent
do more software developers need to know about this kind of stuff, how to pass that on, and also how to then to
influence the people around them, in particular, the consumers
of the information, because they also need to get it. I mean, it's all very well saying we're going to run the businesses in a much more data-driven
or data-informed way, but so many people just don't understand how to look at data. I mean, I see senior business meetings where they're trying to make this, where they claim they're making
decisions based on the data, and they take two populations and compare them just by
looking at the averages with no idea what the
actual data distribution is in the two populations, so whether it makes any sense
at all to compare averages. You understand that. This is . But so many people don't. - Yeah, yeah. I think it's a society as a whole. We saw that at the
election four years ago, with the misunderstanding
of what probabilities mean, the US election. - Yeah. - I'd want to shift the topic
a little bit to how we work, the software engineering
practises that we use when we're working with data. I think one aspect of, we're incorporating
machine learning models, learned models, even
just linear regression, into our software systems
a lot more these days. And there are some
interesting characteristics that are quite different, I think, from the way that we
traditionally think about, say, requirements for software. It's deterministic. I mean, we are actually
building systems a lot that are much more non-deterministic
than they used to be. We used to consider non-determinism a bug or a race condition or
something like that. Now, the models that
we're building, actually, we can't always predict
precisely the answer they're going to come up with. Is there, how do we deal with this? Are there ways that we can
still ensure the quality and correctness of our software, even when the specific
answers are probabilistic? - I mean, that is, I mean, it's not an area I've spent a
lot of time really looking at, but it is an area that
interests me in the sense that I kind of feel that
we have to kind of move into a world of, much more a world where
we're debugging the software. I mean, if you've got a
machine learning model, you've almost got to
take an assumption of, this has got bugs. My job is to try and figure
out what those bugs are, particularly the ones that
are going to bite us hard, and try and find them before
they actually do surface. And we see where we get
machine learning models that we can have, throw up correlations are really bad, like sending insurance
quotes and you find, oh, the people who are mostly getting the expensive insurance quotes happen to be people of colour. That's not a good thing to
bring those kinds of biases in, but it's coming in from other places. We know these kinds of biases are very deeply right in our system, so it's natural they're going to show up in machine learning. So there's a role, an important role for software, in the software development area, whichever sort of branch, part of it, activity that you want to focus on, that says yeah, we've got to
try and debug these things and think about trying
to surface the problems. And then once we've surfaced them, we can think about what we might be able to do to mitigate them. - Yeah, I think there's
a lot more we have to do in testing the limits of our models. And bias is one of those things, obviously, that we need
to start testing for. There's a few tools out there. I know Lime is one that allows
you to look at different facets of the data and
understand what happens if I use just this one subset
of data to train with, or what happens if I remove
a particular subset of data, does that change the outputs? And so I think that, I mean, we're starting to... One of the problems, I think, with these learned models
is that they're opaque. It's really hard to understand
how a decision was made, and yet life-changing decisions are sometimes made by these models. Do you think we're getting to the point where we can explain models at all? Can we explain why these
decisions are made? - I don't know, but I
agree with you completely that explainability is, I think, one of the greatest challenges and something that is we
have to demand as citizens. I mean, it's one thing to say, oh my algorithm, I don't know why it is, as soon as you buy a kettle on Amazon, I get bombarded with ads for more kettles wherever I go on the web. I mean, that's not, you
know, it's irritating, and it kind of makes you
very cynical about AI, but it doesn't actually do any harm. But many people, their
decisions are the case. I mean, there was the
situation in the UK recently where there was adjustments made to people's university placement exams. I mean, that's a huge
life-changing decision. Whether you go to this university or that university can
make a big difference. And the reaction was,
well, the algorithm did it, and that's not going to
be an acceptable answer, and it shouldn't be an
acceptable answer for many things that it currently is treated
as unacceptable answer. And I think that's the underlyingly, one of the big problems
around using machine learning. People are really kind of jumping on top of the machine learning, and yet really, is it better to use those
than some of the more formal, not formal, but more analytical
statistical techniques? Is machine learning really
going to do you better than a simple piece of linear regression that not just gives you some
answers that are useful, but also coming up with the model gives you insight as to what's going on, and that insight is
probably just as useful as the answers that you're getting. With the machine learning, you're just kind of
throwing it in a black box and getting an answer
out, which is not bad, but for many things, we really, I think, need to do more investigation. And that argues, I think, for much more deliberate modelling. I'm using that deliberate term again. It's obviously my term of the day. - Yeah, I think the, I mean, it all comes down to
the objective function that you're using to develop
the model with, right? If it's simply maximising revenue, then there's probably all
kinds of dark patterns you could encode in your model that are going to allow you to do that. But if we take into account, I know there's a, spoiler alert. I'm probably going to
put this on the radar. Finally we've got a library now that implements differential privacy so that you can actually understand, you can use that privacy
preserving qualities in the objective function
in training your model. So I think that's something that I hope we see a lot more of, and it kind of brings up, I think testing in the
world of machine learning is quite different, and we probably need some more
tools around that, I think, to be able to... I know I've read a paper
in IEEE Software recently that talked about how the requirements, they did a study of machine
learning-based software and found that requirements
come in a much different form. We're used to getting
requirements for software in the form of examples or business rules, repeatable business rules. Now people are getting requirements in the form of statistical qualities or general ranges of values. They're more quantitative requirements. And so I think that... Do you think that changes
how we're testing, and do we need a different
way of doing that? - Yeah, well it should do, right? I mean, I think it's an interesting thing when we start talking about
outcomes in terms of, well, I'm looking for an outcome
that sort of improves this probability distribution compared to what it was previously. A lot more fuzzier things. I think it also, I mean, when it goes back to the debugging side that I talk about later on. I actually think one of
the most useful things for many of these machine learning things will be using it as a tool to try and understand what's going on. So when you get a machine
learning algorithm that does very well at something, then saying, well, how is it doing that? What is it looking at? Can we kind of dig in to try
and get the understanding of what's happening inside
this black box in there so that we can take it
out of the black box and move it into the analytical world and perhaps improve its
performance even more because we've now got the analytics. So I suspect there could
well be some interesting back and forth between the
machine learnings world and the more analytic world where you'll be constantly
bouncing back and forth to try and better
understand what's going on and then feed that through
into a more interesting model. But I'm saying this,
again, from an outsider. I don't do this stuff. So, but I do talk to those
of my colleagues that do, and they haven't told me
I'm an idiot yet, I guess. - Well, one thing I know
you do do is refactor, and you recently updated your
classic book "Refactoring" to work with JavaScript,
amongst other things. And I wonder how does refactoring
work in the world of data? I know those data specialists
we talked about at the start don't like to change their
data models very much. They don't really like
to change the schemas. Is there a place for
refactoring in the data world? - Well, as we know,
there most certainly is, and that was particularly the work of another colleague of ours, Pramod, who together with Scott
Ambler wrote the book on database refactoring a long time ago. But sadly, one of my greatest
disappointments is that I created my signature
series with Addison-Wesley hoping that it would sort of
raise the visibility of books published in the series and make the ideas more common, which it might've done with
things like continuous delivery, which was in the series. But sadly, it didn't in
database refactoring. It didn't get as much
traction as it should've. I mean, actually refactoring databases, it's harder than refactoring code because you have to migrate data as well as refactor the schemas, but it's certainly doable and we've been doing
it for decades, right? Here at ThoughtWorks, that's
the way we deal with databases, and has been for a long time, but it's still, from what I gather, not well known enough across
the industry how you do this, which is a really great shame, because if the database
is such a central thing that if you can't evolve that, you can't have an evolvable system. And so you have to figure out, it's not hard to do, but you have to make the effort to learn how to do the
database refactoring and work with that. Now, when you're talking
about huge amounts of data that you have in speculative
data acquisition, then that's a whole different ball game. But again, of course, the whole thing about
speculative data acquisition is you're just grabbing it
in whatever form it is. It's when you're making this
more deliberate structure to work with. But then you have to remember
that that deliberate structure has to be an evolvable form, not a set in concrete form. - Yeah, one thing that I
frequently get asked about is transitioning to
microservices architectures and how do we decompose monoliths? This is a problem that we've encountered over and over again. And I always tell people,
you know about this book? It's called "Refactoring
Databases" by Pramod Sadalage. And I'm surprised how many, that it's not that well known. But I tell people that's where you start. This is where the problem
exists with models is the heavy dependence
on stored procedures in the database and the necessity of being
able to split that database up across multiple services
is probably a big one. I think it's time we
answer a couple questions from the audience. - Yeah. - Nigel just gave me one from Julia Neil. Hi, Jules. "People are probably bad at
understanding probability. I think that as systems
engineers and data scientists, we should make software that
can shoulder this burden, not necessarily expect people
to understand probability. Do you think we need to do a better job of writing software that
helps people with that?" - That will certainly help. And that's one of the
things that interests me with things like 538's work, 'cause they're trying to make probability more of a visual thing. I mean, you also see it in some
of the stuff around COVID-19 where people are similarly
trying to explain forecasts and the like in a more visual manner. It's not the visualisation, the explainability of the probabilities. But having said that, I also do think that people need to learn more about vulnerability, and I don't think it's, we should treat it as something
that people can't learn. I mean, let's not forget, it wasn't that many hundred years ago that long division was considered to be an extremely advanced concept that you would only learn
about in university, and now we take long division for granted. So I think if we could learn
how to do long division more widely across the population, we can learn about understanding
probability more widely. Maybe we get more people
to play tabletop games. Maybe that's the answer. - Yeah. I'm just looking at the... - Yeah? - Here's a good question. Andre says, "I feel like
the word itself refactoring is hard for common people to understand. Is it rebuilding? Is it retrofitting? Do we have, would you like to comment
on the interpretation of the word refactoring and it's usage? - Yeah, well, refactoring is a fairly precise technical term I wouldn't expect someone
outside the software world to understand what it means, just as I wouldn't understand
the technical term, I don't understand what four-by-12s are, which Cindy and her friends
in the building trade talk about all the time. I mean, it's a technical term,
but I can still explain it. And I always, when I'm trying to explain to
somebody outside of software, the way I try to look
at this, I say, well, if you've got an existing software system and you need to make a change, one way of making that change
is you kind of slap something on top of the existing software that kind of makes it
twist the way it goes. Kind of like sticking a patch
on something that's broken. And then the problem, then, is that as you make more and more changes, you get patch on patch on
patch on patch on patch, and the whole thing becomes
this very complicated and very precarious structure. With refactoring, what
you're doing instead is you have to go inside
the existing software and build it to make it perfect so that the new thing
will just swap in exactly without being kind of patched on top. Now, doing that change on the
existing spectrum of software, obviously you're doing some
kind of fairly invasive surgery, so you need to do it in a
very disciplined manner. And that's what refactoring is. It's a disciplined manner to change the shape of the software so that the new capability
can just slot in. So you change it to make it look as if it was originally designed for that new feature coming. That's the term Walt Cunningham uses. You try and you get that impression. And if you do that, then the software
doesn't end up then being this patch on patch massive complexity. It can evolve steadily to grow. But you do have to
understand that discipline of how to do it. And I don't think refactoring
is necessarily very hard to learn how to do, but you need to learn how to do it and be able to be
disciplined to do it well. And that's a tricky thing to do if you've not got somebody
who's able to teach you. You've got to have that
determination to do it. And there are a number of other skills in the software world that are like this. Working with test-driven development is a similar kind of thing. You have to kind of learn the scale, and then once you've learned it, it's actually fairly straightforward, and you learn when to use
it and when not to use it. Refactoring is the same thing. The great example of refactoring is that when you're able
to refactor effectively, you can make changes to software
without getting yourself into a panic and without
causing a lot of problems. And that can be worth it just on its own because it's so much more calm process when you're able to do that. - Yeah, I find a lot of people
use the word refactoring when they're really rebuilding. They just don't want to
tell their boss they're- - Right. Yeah. I mean, And I use the
general term restructuring just any form of it, though refactoring is very precise. Do very small steps, none of which gains the
observable behaviour in software. That's the discipline that
allows you to do that. And then each snap is too
tiny to be worth doing, but you can spring you
can compose the music, you string them together, and you never break the
software in the process. - Yeah, I kind of like Kent Beck's, he calls it tidying and kind of, I think we probably, we
need to start wrapping up, but along these lines, I want to make sure I ask, do you think refactoring, is it something we have a choice in? Is it something we do
because we're told to, or is it something we do
because we are expected and then we have a responsibility
to as software engineers? - I've always argued that
we have a responsibility as professionals to do this. I mean, what we're being paid for is to build software that
can produce, that can be, where the stream of new
features can be implemented as cheaply and as quickly as possible. I'm assuming that the software is not dead and we've been shuffled alongside, which is very, relatively rarely the case. We're being paid to work on this software so we can continue doing things to it. The cheapest way to keep
a software evolvable is to keep it in a good condition. Keep the cruft out. And a technique that's
really vital in keeping cruft out of software is refactoring. I mean, there are other
techniques that play a role in this continuous integration. Having self-testing code. These are important things, as well. But refactoring as a key
technique is part of that. That should be, it's our
professional responsibility. Just as doctors wash their hands before they start
diving, cutting into you, we have to do that. And it's not a question of
whether we talk about it because people outside of our profession can't understand the value of that. That's why they pay us. They pay us to understand
what good practise is, and we should then do that good practise. And I think refactoring is
one of those good practises that we should do. - Thank you. I think we need to wrap it up now. It's been, we could probably
go on like this all day. I appreciate people listening
in on our conversation and I hope you got some
nuggets out of that. So thanks, everyone. And I will hand back to Nigel now. We'll try and answer questions
on Slack, by the way, the ones we . - Yeah, that's what I want to encourage. There's a Slack channel. It's an active conversation. There's a record there forever joining. And it's all sorts of
times all over the world. I just want to thank Martin. Amazing. It's so good to have the
original gangster of refactoring and so many things in the room, and Scott, masterly contribution
to the conversation today. And I wish Martin had
had the chance to ask you some curly questions, as well, but maybe that's the next ex-con. So thank you so much.