In 1997, the US National Institute of Standards
and Technology announced that they were looking for an encryption algorithm to become the
Advanced Encryption Standard, or AES. Cryptographers from around the world were
invited to submit proposals for the new standard. Fifteen algorithms were submitted for consideration,
and they became the subject of multiple years of debate and research and analysis. Some algorithms were deemed not secure enough,
or not performant enough, to become the AES. At the end of the process, one algorithm — Rijndael
— won the competition and was selected to become the standard. Today, AES is everywhere: it encrypts hard
drives, it secures internet communication, and many processors on computers and phones
and other devices come with built-in instructions for running AES efficiently. But how does AES work? And what makes for a good encryption algorithm? That's what we're going to explore here; but
first, let's start with the basics. When we talk about encryption, what we want
is to take some information, called the plaintext, and use a secret key to convert it into ciphertext. The ciphertext should ideally be encoded in
such a way that someone would need to have that key in order to translate the ciphertext
back into the plaintext. When working with computers, the plaintext,
ciphertext, and key can all be thought of as sequences of bits: all just 0s and 1s. So what could this encryption look like? Let's take some plaintext: some sequence of
bits. And let's create a randomly-generated key:
some secret sequence of bits, that for now we'll assume is the same length as the plaintext. How could we come up with an algorithm that
encrypts this plaintext? Well, we need some way to transform the plaintext
by using the key. Here's a simple algorithm we could try. For each bit of the plaintext, we'll look
at the key bit in the same position, and make a decision: if the key bit is a 1, that means
flip the bit in the plaintext: so a plaintext 1 would become a ciphertext 0, and a plaintext
0 would become a ciphertext 1. And if the key bit is 0, that means leave
the plaintext bit alone: a plaintext 1 stays a 1 in the ciphertext, a plaintext 0 stays
a 0. If you're familiar with Boolean logic, you
might recognize the operation we're performing here as the exclusive or operation. And the result of applying this bitwise exclusive
or is that we get an encrypted ciphertext. This is a perfectly secure encryption algorithm. If an adversary were to see just the ciphertext,
it wouldn't give them any information about the contents of the original plaintext. But there's a potential major problem with
this approach: the security breaks down if we ever use the same key more than once. If we reuse the key, then it becomes possible
for an adversary to gain information. Think of it this way: when the key is used,
we combine each bit of plaintext with a bit of the key to get a ciphertext bit. So if two ciphertexts have the same bit in
the same position, then their plaintext bit in that position must be the same too. And that can be enough information to perform
some statistical analysis to guess at what parts of the original data might be. So for this algorithm to be secure, we can
only ever use the key once, which is why this approach is known as the one-time pad. But if the data we're trying to encrypt is
longer than our key, or we want to use our key to encrypt multiple pieces of data, then
we need a different approach. So what properties are we looking for in a
good cipher? Two important properties to think about were
originally defined by computer scientist Claude Shannon in the 1940s. He described the properties of confusion and
diffusion. Confusion is all about complicating the relationship
between the key and the resulting ciphertext. In the exclusive or cipher we were using before,
the relationship between each key bit and each ciphertext bit is fairly obvious: changing
a bit in the key results in a change in the same position in the ciphertext. But ideally, a cipher with confusion will
ensure that each bit of the ciphertext depends on many bits in the key, so that the relationship
is harder to analyze. Diffusion is all about spreading out the statistical
structure of the plaintext. Plaintext messages have statistical patterns:
in English, for example, letters vary in how frequently they appear, and some words are
more popular than other words. If those patterns were to appear in the ciphertext
too, an adversary could analyze those patterns to gain information. So to make our cipher secure, information
in the plaintext needs to be spread out evenly across the ciphertext. In practice, that means a cipher with diffusion
should work such that changing any one bit in the plaintext results in a change to about
half of the bits in the ciphertext. But our exclusive or cipher doesn't do that
: notice that when one bit of the plaintext changes, only one bit of the ciphertext changes. A good cipher should have both confusion and
diffusion. So let's turn our attention back to AES, the
Advanced Encryption Standard, to see how its design achieves these goals. AES is what's known as a block cipher, which
means it takes a plaintext block of a fixed number of bits and produces a ciphertext block
of the same size. AES uses a block size of 128 bits or 16 bytes. So we start with 16 bytes of data that we
want to encrypt. AES lays out these bits in a 4x4 grid — this
is known as the "state" of the algorithm, and we'll make updates to the state repeatedly
until we get our encrypted block. So what do we do to encrypt this block? Well, to go back a step, what would our exclusive
or cipher have looked like here? We'd need a key, and a key that's exactly
16 bytes long. And then we'd apply an exclusive or operation
between each bit of the state and each bit of the key to get the resulting state. We call this process "adding the key" to the
state. Now, how could we try to introduce more confusion
into the process? Remember that confusion is all about trying
to make the ciphertext depend on multiple parts of the key, instead of just one part
of the key. So here's an idea: let's take our original
key, and expand it into multiple keys. There's lots of ways we could do this expansion,
but AES specifies a particular process that involves starting with the original key, and
then determining the other bytes in the expanded key by combining values from particular previous
bytes. At particular intervals, the process will
add in some constant value or shift the bytes around or swap bytes for other bytes. The goal is to get an expanded set of keys
where the bits of the expansion depend on many or all of the bits from the original
key. And once we have that expanded key, we can
use it on our data: rather than just adding one key, we'll apply multiple rounds of transformation,
where in each round, we'll add a different round key. If you stop to think about it, though, adding
many round keys in sequence isn't actually any more secure than just adding a single
round key. After all, adding many round keys is the same
thing as just adding one key that is the result of adding together all of the keys. But the idea of applying multiple rounds of
transformation can become helpful if we start to add other steps into the process. What other techniques from cryptography could
we use here? One simple type of cipher is the substitution
cipher. In a substitution cipher, we transform data
by following a defined pattern to replace units of data with other units of data. A simple cipher for working with letters of
the alphabet, for example, might replace the letter A with B, the letter B with C, and
so on. So a message like HELLO becomes IFMMP. Of course, a substitution that's just a linear
shift of all of the letters by a fixed amount is fairly easy to analyze and break, so a
better substitution would avoid such a simple relationship. AES uses a substitution step too, called SubBytes. Instead of substituting letters for other
letters, each byte in the state is substituted for a different byte. The encryption algorithm defines exactly what
substitution to perform for each byte: 0 gets replaced by 99, 1 gets replaced by 124, 2
gets replaced by 119, and so on with a different substitution for every byte value. So when AES substitutes bytes, it goes through
each byte in the state and replaces it with its appropriate substitution. Just like with our letters where we wanted
to avoid choosing a simple relationship for the substitution, the AES substitution was
chosen to be an algebraically complex expression, and one that isn't just a linear transformation
of the bytes. That said, even though the substitution is
complex, it's public knowledge and known in advance. So AES couldn't rely only on this substitution,
since anyone can reverse the SubBytes step and get back the original data. Instead, we interleave byte substitution and
the round key addition step. Something like this: we take the input state,
and start by adding an initial round key. Then, we repeat some number of rounds, and
in each round we perform a byte substitution and then add the next round key. Only adding round keys one after another doesn't
do much for security. Only substituting bytes doesn't do much for
security either. But integrating these two steps together results
in more confusion: each bit of the ciphertext depends on multiple parts of the original
key by virtue of our key expansion, and it's not immediately obvious how to reverse the
encryption without having access to the key. So what's missing from our algorithm now? Recall that, in addition to confusion, we
also want our algorithm to provide diffusion: spreading out information from the plaintext
in the ciphertext. When we change a single plaintext bit, we
want many (ideally about half) of the ciphertext bits to change too. But our algorithm right now doesn't do this. When we change a plaintext bit, it might result
in changes to the encrypted version of that byte, but it's not going to have an impact
on the other bytes in the state. So AES needs some way to introduce diffusion
by spreading out information from bytes to other bytes, and ideally add even more confusion
to the algorithm in the process. There are two operations AES uses to achieve
this: called ShiftRows and MixColumns. Let's actually start by taking a look at the
MixColumns step to see what it does. The role of the MixColumns step is to mix
together the values contained in each column, essentially diffusing information throughout
the values. It operates independently on each column. For each column, MixColumns defines a particular
expression for how to combine the four bytes in order to get a new value for each byte. Each new value in the column therefore depends
on all of the previous values in the column. It's worth pointing out here that we're using
multiplication and addition symbols, but we're not quite performing the normal
multiplication and addition you might be used to. If we did use normal multiplication and addition,
we might end up getting values larger than what could be stored in a single byte. Instead, these multiplication and addition
operations have to work with only a finite number of possible values: the 256 possible
values for a single byte. Since there are only a finite number of possible
values, this is known as finite field arithmetic. So the MixColumns step successfully spreads
out information throughout the columns, but so far, no information moves between the columns
throughout the whole algorithm. That's where the earlier ShiftRows step comes
in. This step shifts the values in each row by
a different amount: the first row is left unchanged, the second row gets shifted by
one, the third row shifted by two, and the fourth row shifted by three. These two operations combined — ShiftRows
and MixColumns — help to mix together the values from the state, and even more so when
these steps get repeated across multiple rounds. So with all of these operations in mind, we
can now describe AES in full. We start with some data, and a key we want
to use for encryption. We start with a key expansion step that takes
the key and expands it into many round keys, each of which depend on the values from the
original key. We then add in the first round key. Now comes the round itself. In each round, we start by applying SubBytes:
substituting each byte for its replacement. Then we ShiftRows: shifting each row by a
different amount. Then we MixColumns: combining the values from
each column to get new column values. And then at the end of each round, we AddRoundKey
one more time. This process repeats for each round, until
we get our final encrypted result. It turns out that the MixColumns step is actually
skipped for the very last round only, and that doesn't change the security of the algorithm. How many rounds of transformation does the
algorithm use? That depends on the size of the original key. AES is designed to work with multiple key
sizes: 128-bit keys, 192-bit keys, and 256-bit keys. Each key size is associated with a different
number of expanded keys, and therefore a different number of rounds. And if you've ever heard the terms AES-128,
AES-192, or AES-256, that's what the numbers refer to: the size of the key. The larger the key, the more difficult it
is to attack by brute force. There's a lot going on in AES — substitutions,
shifting values, mixing values, adding keys — but each step contributes to the security
of the overall cipher to form an algorithm secure enough to be used to encrypt data every
day all across the world. And there's other details we haven't covered
yet, like how to take this algorithm and apply it to sequences of data longer than a single
block. But hopefully this gives you some insight
into the inner workings of the algorithm itself, and an appreciation for some of the ideas
behind what makes a cipher secure.