A few years ago, I got a message from an ex. And it said,
“roses are red, violets are blue, “I've got chlamydia, and you might have too”. Which, first of all, is genius, but it meant I was taking a trip down
to the clinic the next morning to get tested. I have a point to this story, and I swear
it's about computers. If you ever want to see a complete cross-section
of London life, do join the queue for the STI testing clinic
first thing in the morning. Actually, get yourself tested anyway, it's
the right thing to do. Anyway, I go to the clinic, I pee
in a sample tube, seal it up, hand it over, and they give me a little business card with
a phone number and a passcode on it. And I can call that number the next day, and if my tests are all clear,
then the system will tell me. And if they’re not, it'll pass me over
to someone at the clinic, because you do not deliver bad news
with an automated system. So the next day, I call the number,
type in the code, and the system says: "Thanks. Here are your test results." And then there is a pause.
A proper, ten second long pause. And as I stood there on my mobile
in the middle of the street, it felt a bit like that artificial pause that
they add in reality shows before announcing the winner. It felt like there was a spotlight on me, all dramatic tension music playing in the
background. And then it says "We're putting you through to
one of our team. "Please stand by." That's bad news, right? Bad news comes from a human.
Why are they passing me through? Someone picks up,
I give my passcode again and they say, no, it's all clear, you don’t need to worry,
everything’s negative, you're fine. And I ask why the system put me through,
and they say, "oh, no idea, it just does that sometimes". It just does that sometimes. Did the database lookup fail in the background, and it dealt with the timeout by
just passing me to a human? Was there a small note on my file somewhere
that confused it? I've no idea, they had no idea, because whoever wrote that code,
whoever designed that system, the moment there was any error, they just told it to go to
"we're putting you through" without explaining why and without thinking
what that might cause. That phone system is used by people asking
about things way more serious than chlamydia, the sort of things you can't cure
with a course of antibiotics. And sure, on the scale of complaints
it is minor, but mistakes like that are a symptom of something
much bigger. I've talked about bodging things in the past. I encourage it for hobby projects. Just slapping something together as a proof of
concept, just so it works for you is a great principle
when you're making a thing for yourself. But if you're making something for the public,
for mass consumption, particularly something that's going to be
used by people in very vulnerable moments, then you've got to take a lot more care. Every time we build something for the public,
we have to start making the trade-off: how much time and money is it worth to deal
with every edge case? Maybe the designer thought that lookups would
only fail one in a million times, and if that's the case, then yeah, it's probably not reasonable to bother recording a whole extra message and programming in a whole extra edge case. But if the lookup fails often enough that
the clinic receptionist just dismisses it as normal... well, by that point it's too late to make a change,
isn't it? The system's in place, it'd be far too expensive to fix it now.
It'll do. The trouble is that we're often dealing with
unknown harms and unintended consequences. Far too often a bodged-together system that
was just meant to be a test gets rolled out into production, and
everyone just has to deal with the bugs because that's all anyone can
afford to do. I will always bet on incompetence
rather than malice, I will always bet on someone not thinking
about consequences rather than thinking of them
and going “who cares”. We see this with big tech companies. Facebook allows the world to communicate in
unprecedented ways, it does huge amounts of
emotional labour for us, it allows people to keep in touch with
old friends that they'd just fall away from otherwise. But it's also enabled stalkers and abusers
to reach people they shouldn't, it’s allowed a huge amount of private data
to be misused without real consent, and it’s arguably helped cause terrifying
election meddling. Now, I don't believe anyone in Facebook's
management was rubbing their hands in delight
at the chaos it was causing: it was just a series of seemingly-reasonable
decisions that added up to huge consequences. And then there's YouTube: it allows anyone
to publish their experiences to the world, it provides income for creative people that
bypasses traditional media, and it’s helped people connect
with other people's lives. It's also helped to radicalise folks, to promote conspiracy theories,
and to traumatise children. Are those tradeoffs worth it? It depends on your moral framework,
and it depends -- let's be honest -- on how it's affected you personally. It's not like there's a crystal ball
that'll tell you, yes, your brand-new dating app you’re developing, that will cause 1,000 couples
to marry and live happily ever after, but it will also get three people murdered. The real world is not a trolley problem. The STI result system that I called
presumably reduced the workload on staff, and it allowed people to check their results
out-of-hours when it was convenient and discreet
for them. You'd hope something like that wouldn't have
a downside, but then the designers screwed it up because
they thought it was good enough, and it wasn't. One extra check,
one extra voice message that said "Sorry, I can't find your result,
just a moment" would have solved that. Every time we design a system, we have to
minimise the potential harm. Look at the code that you write,
look at the systems you design, and think: How could this be abused?
How could this fall apart? What are the failure states here? If you were a bad actor,
if you wanted to use this maliciously, how would you go about it? Think about how you’d attack your own systems,
explore those failure states, deliberately screw things up
and see how your code copes. Because if you don't, someone else will. Thank you very much to the
Centre for Computing History at Cambridge who've let me film with all their old equipment, and to all my proofreading team
who checked through my script.
Too bad that marketing doesn't like error messages and so usually you are not allowed to put in helpful error messages if you like.
Assuming that the occasional "human person for all-clear" case is a fallback for when the automated system throws an exception...
...That fits quite well with the video from Traveller's Tales' Jon Burton a while back, explaining how encountering an exception in Sonic 3D would send the player to its Level Select screen (or in the case of Mickey Mania, send them a level forward/back):
https://www.youtube.com/watch?v=i9bkKw32dGw
But is it really part of the design that when a machine answers you're clear, but when it transfers you to a human you're pozzed? It seems to me that that's the sort of heuristic that gets discovered and applied after the fact, like gauging acceptance to a university by the thickness of their reply envelope (thick=accept, thin=reject). How obligated are we to design systems that comport with our users' guesses about how they function?
This guy is a genius.
If system always redirects only "bad news" responses to a human - it will actually mean "deliver bad news in automatic mode", because the fact of redirecting definitely means something is wrong. So I assume developers of this system actually understood the situation better, than Tom, and decided to redirect some "good news" answers to a human. So, people know the fact of redirection to a human is not yet "bad news", so they in fact don't receive "bad news" from a machine. And they receive both good and bad news from a human.
not our fault, It's because the requirements aren't clear enough.
This is relevant if you factor bad design decisions out of the equation. Too many times I had to ship features without some visible and meaningful fail state because the designer and/or product owner didn't thought that the system could enter some fail state and the release date was to close to go back and design/decide what the user should do. And I'm not talking about unhandled exceptions in code, but about "user's permissions and some specific setting that may result in privilege escalation" type of problem. I know that it's the developer's job to handle errors, but, in the case told in the video, what should the developer do? Take decisions about the system's design and workflow? It is plausible in some small business, but almost unthinkable in big corporations, where small decisions can cost literally millions of dollars.
These things are not really encouraged. If we want it to be like that then I think we need to get away from shipping fast and letting PMs control what programmers tasks are.
From my own experience it also seems that we put barriers on ourselves. Sometimes I will find user messages misspelled in the codebase, but I won’t fix it/them because I know I have to go through these steps:
Which can feel like a heavy burden compared to just commit to master and leave it like that.
How does he know this was an error, there was no error message. Maybe the system was changed to do this behaviour for some reason.
I agree with thinking about the consequences of your code. In the medical industry this can be devastating. Think of a machine that analyses blood for viruses and there being some error in the code for example or the much explained therac-25.
More than that shit code and coding practice has a detrimental effect on the people who work with the system as well. That includes the developers as well so I think the advice here is sound.
When I wa a uni we had a course on Professional Development which I at the time thought was a load of rubbish but nowadays I constantly go back to what was taught in that course. I think it was probably one of the most important courses I took.