September 2018. I was tired, I was exhausted, I’d just got back home from something, and I didn't have the energy to cook. So I went to my phone, opened up a certain food-delivery app that’s
popular in the UK, and I ordered pizza. Now, I know: that food-delivery company’s
employment practices are questionable, there are more ethical ways to get dinner
delivered. But I was tired, and I was hungry. As are a lot of their drivers. But that was the Night of the Multiple Orders, when a bug in that app meant that some people
around Britain ended up with identical food orders
being delivered two or three times, and others got nothing at all. And I nearly got caught up in the chaos. To explain what happened, I need to tell you a story about two generals. The Two Generals' Problem is a
classic of computer science, and it goes like this: picture a valley. In the middle of the valley is a
heavily fortified castle. On the edges of the valley are two armies. The generals of these armies know that the
only way they can win a battle and overwhelm the castle is if they both attack
at the same time. A single army isn't going to make it. They need the combined strength from
both sides of the valley to win. And the only way they can communicate is by sending messengers on a
risky path through the valley. And General A won’t know what the right
time is until everyone’s already in position. How can those two generals coordinate to make sure they
attack at the same time? This is a magical computer-science-land problem,
by the way, so reasonable suggestions like “semaphore”
or “telescopes” don’t apply. On the surface the problem seems trivial. General A could just send a message to
General B with a proposed time. Say, 8 o'clock. But the messenger has to pass
through the valley, and if they’re spotted, they’re, um, not going to make it to the other side to
deliver the message. So how does General A know that General B
received the message? The messenger might not have made it. And if that happens, A will attack, B won’t, and they’ll lose. So maybe they arrange it so General B has
to send an acknowledgment back, and General A will only attack
if that acknowledgement arrives. But that now runs into the same problem: how does B know that A has received
the acknowledgement? If it doesn’t get through, A won’t attack, B will, and they’ll lose. So, General A could send another acknowledgement
for the acknowledgement. But how do they know that message
has gotten through? Well, General B could send an acknowledgment for
the acknowledgement for the acknowledgement and so on, and so on, and so on. This problem is unsolvable. I know, it feels like there should be some
hacky workaround like sending 200 messengers,
and sure, that would probably work
in the real world. But this is magical information-theory
computer-science land. Under these strict rules, there is never a guarantee,
there is no certainty, there is no arrangement that can be made,
there is no way, that the two generals,
the two computers sending data, can agree that the message has definitely
been received and acknowledged. Now, with computers you’re not usually dealing
with such high stakes. If you are in computer science and working
on a problem that involves potential loss of life, I really hope you aren't watching a series
called "The Basics". Anyway. I was ordering food. And I put my order together, I tapped ‘pay’, I put my fingerprint on my phone’s reader. I got the little Apple Pay progress bar,
and the little tick. And then I got a message from the app
saying that there had been a problem, and my order had failed to go through.
Would I like to try again? And I was about to. I was about to hit ‘pay’ again. And then something, just in the back
of my head, said: hang on. There was that little tick saying
payment had worked. And I’m enough of a computer nerd to go
"I’m not sure I believe that failed". So I checked the ‘order history’ page. It took a few tries to load,
but when it finally did, there was my order. Processing. It had gone through, but
the acknowledgement hadn’t come back. Or, rather, something had gone wrong
on the app’s servers, and the logic they’d written
thought it had failed when it hadn’t. So I sat tight,
I hoped that my food would arrive, and I figured that the engineers were probably
having a very bad day. They really were. Because I wasn't the only one. People all over the UK ordering via the app
were going to the payment screen, hitting the button and getting
"try again". And a lot of them did.
Again, and again, and again. They were General A,
and the app’s server was General B, and they were part of a real-life,
complicated version of the Two Generals' problem. Imagine all the customers as General A, sending message after message to General B.
B received the messages, dutifully took the money from the credit card
every time -- they attacked -- but something had happened that stopped
the confirmation message getting through. According to the flood of angry reports on
Twitter, sometimes the restaurant would realise the
problem and just send one order. Sometimes the restaurant wouldn’t realise, and three different drivers would arrive
with three identical orders. Sometimes no food would arrive at all. The app’s customer service line was swamped. To be clear: this was not the sort of thing
that is one engineer’s fault. When something goes this drastically wrong, there have been many poor decisions made over
a long period of time. A single human error is never a root cause. So what else could the app team have done? How can you solve the Two Generals' problem
in the real world? Well, first, maybe no-one should have
been able to place two identical orders on the same credit card,
for the same restaurant, within a few minutes of each other. That seems like the sort of thing there should
have been a check for? But the real solution is an
“idempotency token”, or an “idempotency key”. This is a unique value generated on
the app, or on the web site: and it’s a shopping cart ID, basically, and it’s sent along with the order. it's not just for shopping carts, though: the idempotency token could be attached to
instructions to delete the oldest log file, or send a text message, or anything that you only want to happen once. The server stores the idempotency key to keep
track of the request. And if another request arrives with the same
key attached, then the server knows it’s already
dealt with that request. So it doesn’t fulfill it again; instead it
knows that the reply didn’t get through, so it just sends back a copy of that first
acknowledgement again. Now, that still won’t help if none
of the messengers get through, if the connection completely fails, but for real-world problems,
humans will notice that. Idempotence means that you can request the
same thing multiple times and it’ll only ever happen once. That’s the way to fix the
Two Generals' Problem. I was lucky. I placed one order,
I was charged for one order, and one order of food arrived
half an hour later. Next time, I’ll just cook for myself. This series of The Basics is sponsored by
Dashlane, the password manager. I mentioned in the previous sponsored section
that they sync all your passwords and payment details between
all your devices without ever
knowing those passwords. Which sounds a bit like magic. When you sign up to Dashlane, you choose a
Master Password. And incidentally, you can do that by going to
dashlane.com/tomscott for a free 30-day trial. Anyway, when you sign up, you pick a single Master Password, and that is never transmitted over the internet. Not even to Dashlane,
not to their servers, nowhere. If you don’t know that password, all that private data just looks
like random noise. So: when you sign up to a new website and
Dashlane generates a long, complicated password
for you, it is bundled up and encrypted using your
master password, that only you know. That encryption takes just long enough,
a few fractions of a second, that there’s no way to
brute-force it back open. That encrypted bundle gets sent to Dashlane: they just see random noise with a label saying
‘please synchronise this’. So they pass the bundle on to your other devices, and those devices, and only those devices,
can decrypt it because, at some point, you’re going to open up the app and type
in your Master Password. In truth, it’s actually a little
more secure than that, because behind-the-scenes they also generate
a different key for each device you log in to, but that is a whole other level of security that I have actually found it impossible to
explain in a script. But, in the massively unlikely event that someone
did compromise Dashlane’s servers, or bribe some employee, it wouldn’t work. All they could do is watch those packets of
random noise get shuffled around. All your data stays on your own devices. Which means,
if you lose your Master Password, Dashlane can’t help you. But that’s fine, because now you’ve just got a single password
to remember. That is massively more secure than reusing
the same password or variations on a password
everywhere online. Like I said last time:
you should use a password manager. So: dashlane.com/tomscott for a
free 30-day trial of Dashlane Premium, and if you like it you can use the code “tomscott”
for 10% off at purchase.
Repeat packets suddenly make sense
I really hate it when there's no solution to a problem just workarounds and hacks. Irks me to no end. Especially for something that instinctively feels like the solution should be easy.
Tom Scott is awesome
I'm really interested in these sort of pseudo-'real world' problems like the traveling salesman. Does anybody know where I can read about different types of problems?
This isn't the Two Generals problem, though.
I mean, his description of the problem is correct, but it doesn't apply to the food delivery app. The app is having trouble communicating with the server, yes, but the server doesn't need to check whether the app gets its messages. The server isn't saying "well, I'll send a delivery person, but only if I get a response back from the app" ad infinitum. The client-server communication is asymmetric, unlike the Two Generals.
In fact, his solution to the food delivery app problem (an idempotency token) isn't relevant to the Two Generals problem. General A can send an idempotency token to General B, and get back an acknowledgement -- but General B still doesn't know that General A received their acknowledgement, and so on. Sending along a token as part of the protocol doesn't resolve this fundamental issue.
Edit: This was also explained in this thread:
The server is sending the order to the restaurant no matter what. It's a centralized technology.
Very nice explanation of a real life communications problem and a good fix
The two generals problem doesn't really apply here as it is not needed that the two parties take an action at the same time.
The only thing needed is that one party doesn't do an action twice. The other party can check before doing an action again.
Basically, the phone should not start an action until the server tells it its fine to do it.
If the server is not responding correctly, then don't do anything and just go into error state. If the server is getting the same action twice (like paying twice for an order), then it should refuse the second payment. The phone should just check before doing an important action and they should have implemented a transaction way of ordering (like creating an order on a request then paying for it with another).
I feel like this wasn't the best way to introduce the subject. The goal of Introducing idempotency was great. It's the bread and butter of reliable distributed systems, however the 2 generals problem as he stated was already idempotent. A dozen attack at 3pm messages are the same as one. So the concept introduced in the second half doesn't really the back into the first half and the generals don't really tie into the failure other than via the nebulous concept of messages can be lost.
Idempotence "solves" the "exactly-once" semantics problem, but it has nothing to do with consensus between the two generals.