-
Over the years it became clear that DES
and triple DES are simply not designed for
-
modern hardware and are too slow. As a
result, NIS started a new process to
-
standardize in a new block cypher called
the Advanced Encryption Standard or AES
-
for short. Nis started it's effort in 1997
when it requested, proposals for a new
-
block cipher. It received fifteen
submissions a year later. And finally in
-
the year 2000, it adopted the cypher
called Rindall as the advanced encryption
-
standard. This was a cypher designed in
Belgium. We already said that it's block
-
size is 128 bits and it has three possible
key sizes. 128 bits, 192, and 256. Now,
-
the assumption is that the larger the key
size is, the more secure the block cipher
-
is as a pseudo random permutation. But
because it also has more rounds involved
-
in its operation. The slower the cipher
becomes. So the larger the key supposedly,
-
the more secure the cipher, but also the
slower it becomes. So for example, AES 128
-
is the fastest of these ciphers and AES
256 is the slowest. Now AES is built as
-
what?s called a substitution permutation
network. It's not a Faistl network.
-
Remember that in a Faistl network, half
the bit were unchanged from round to
-
round. In a substitution permutation
network all the bits are changed in every
-
round. And the network works as follows,
so here we have the first round of the
-
substitution permutation network, where
the first thing we do is we X or the
-
current state with the round key. In this
case the first round key. Then we go thru
-
a substitution layer, where blocks of Date
are replaced with other blocks based on
-
what the substitution table says. And then
we go through a permutation layer where
-
bits are permuted and shuffled around. And
then we do this again. We XR with an X
-
round key, we go thru a substitution
phase, and we're permute to dance around.
-
And so on, and so on, and so forth Until
we reach the final round where we x or
-
with the very last around key, and then
out comes the output. Now, and important
-
point about this design. Is that, in fact,
because of how it's built, every step in
-
this network needs to be reversible, so
that the whole thing is reversible. And so
-
the way we would, decrypt, essentially, is
we would take the output and simply apply
-
each step of the network in reverse order.
So we start with the permutation step, and
-
we have to make sure that step is
reversible. Then we look at the
-
substitution layer, and we have to make
sure this step is reversible. And this is
-
very different from DES. In DES, if you
remember, the substitution tables were not
-
reversible at all. In fact, they
map six bits to four bits. Whereas
-
here, everything has to be reversible,
otherwise it would be impossible to
-
decrypt. And of course the x or with the
round key is reversible as well. Okay? So
-
inversion of a substitution permutation
network is simply done by applying all of
-
the steps in the reverse order. So now
that we understand the generic
-
construction. Lets look at the specifics
of AS. So AS operates on a 128 bit block.
-
Which is sixteen bytes. So what we do with
AS is we write those sixteen bytes as a
-
four by four matrix. Each cell in the
matrix contains one byte. And then we
-
start with the first round so we X over
with the first round key and then apply a
-
certain function. That, includes
substitutions and permutations and other
-
operations on the state. And again, these
three functions that are applied here have
-
to be invertible so that in fact the
cypher can be decrypted. And then we xor
-
with the next round key and we do that
again. Again we apply the round function
-
and xor the round key. And we do that
again and again and again. We do it ten
-
times. Although interestingly in the last
round, the mix column step is actually
-
missing. And then finally, we XOR with the
last rounds key, and out comes the output.
-
Again, in every phase here, we always,
always, always keep this four by four
-
array. And so the output is also four by
four, which is sixteen bytes, which is 128
-
bits. Now the long key themselves of
course come from a sixteen byte AS key
-
using key expansion. So the key expansion
maps from a sixteen bytes AS key
-
into eleven keys, each one being sixteen
bytes. So these keys themselves are also a
-
four by four array that's XORed into the
current state. Okay, so that's the
-
schematic of how AES works. And the only
thing that's left to do is specify these
-
three functions, byte sub, shift row, and
mixed column. And those are fairly easy to
-
explain. So I'm just gonna give you the
high level description of what they are.
-
And, those interested in the details can
look it up online. So the way byte
-
substitution works, is literally it's one
S box containing 256 bytes. And
-
essentially, what it does is it applies
the S box to every byte in the current
-
states. So, let me explain what I mean by
that. So the current state is gonna be
-
this four by four, table. So here we have
the four by four table. And to each
-
element in this table, we apply the S box.
So let's call it the A table. And then
-
what we do is, essentially, for all four
by four entries, essentially, the next
-
step, Aij. Becomes the current step
evaluated at the look up table. So we use
-
the current cell as an entry, as an index,
into the look up table. And then the value
-
of the look up table is what's output.
Okay. So, that's the first step. The next
-
step that happens is a shift pro step,
which is basically just a permutation. So
-
essentially we kind of do a stick lick
shift on each one of the rows. So you can
-
see the second row is stick lick shifted
by one position. This third row is
-
[inaudible] shifted by two positions and
the third row is [inaudible] shifted by
-
three positions. And the last thing we do
is mix columns where literally we apply a
-
linear transformation to each one of these
columns. So there's a certain matrix that
-
multiplies each one of these columns and
it becomes, the next column. So these
-
linear transformation is applied
independently to each one of the columns.
-
Now, I should point out that, so far,
shift rows and mixed columns are very easy
-
to implement in code. And I should say
that the [inaudible] institution itself is
-
also easily computable, so that you can
actually write code that takes less than
-
256 bytes to write. And you can kind of
shrink the description of AES by literally
-
storing code that computes the table
rather than hardwiring the table into your
-
implementation. And in fact, this is kind
of a generic fact about AES, that if you
-
can have allowed no pre computation at
all, including computing the S box on the
-
fly. Then in fact you get a fairly small
implementation of AES, so it, it could fit
-
on very constrained environments where
there isn't enough room to hold,
-
complicated code. But of course, this will
be the slowest implementation, because
-
everything is computed now on the fly, and
as a result, the implementation,
-
obviously, is gonna be, slower than things
that were pre-computed. And then there is
-
this trade off. For example if you have a
lot of space, and you can support large
-
code. You can actually precompute quite a
bit of the three steps that I just
-
mentioned. In fact, there are multiple
options of pre-computation, you can build
-
a table that's only four kilobyte big. You
can build a table that is even longer,
-
maybe 24 kilobytes. So basically you will
have these big tables in your
-
implementation. But then your actual
performance is going to be really good,
-
because all your doing is just table
look-ups and XORs. You're not doing
-
any other complicated arithmetic. And as a
result, if you can do a lot of
-
pre-computation, these three steps here,
[inaudible] should. [inaudible] and mixed
-
columns can be converted just into a
number, a small number of table lookups
-
and some [inaudible]. All you can do is
just compute the S box, so now your
-
implementation would just have 256 bytes.
Hard coded The rest would just be code
-
that's actually computing these three
functions. The performance would be slower
-
than in the previous step but the code
footprint would also be smaller. So in
-
overall, there's this nice tradeoff
between code size and performance. So on
-
high-end machines, on high-end servers,
where you can afford to have a lot of
-
code, you can precompute and store these
big tables and get the best performance.
-
Whereas on low-end machines like eight
byte smart cards or think of like an eight
-
byte wristwatch, you would actually have a
relatively small implementation of AES.
-
But as a result of course it won't be so
fast. So here's an example that's a little
-
unusual, suppose you wanted to implement
AES in Javascript so you can send an AES
-
library to the browser and have the
browser actually do AES by itself. So in
-
this case what you'd like to, to is you'd
like to both shrink the code size, so that
-
on the network there's minimum traffic to
send a library over to the browser but, at
-
the same time, you'd like the browser
performance to be as fast as possible. And
-
so this is something that we did a while
ago essentially the idea is that the code
-
that actually gets send to the browser
doesn't have any pre computer table and as
-
a result is fairly small code. But then
the minute it lands on the browser, what
-
the browser will do is actually pre
compute all the tables. So in some sense
-
the code goes from just being small and
compact. It gets bloated with all these
-
precomputed tables. But those are stored
on the laptop, which presumably has a lot
-
of memory. And then once you have the
precomputed tables you actually encrypt
-
using them. And that's how you get the
best performance. Okay? So if you have to
-
stand in implementation AES over the
network, you can kind of get the best of
-
all worlds. Whereas, the code over the
network is small, but when it reaches the
-
target client, it can kind of inflate
itself. And then get the best performance
-
as it's doing encryption on the clients.
Now AES is such a popular block cipher,
-
now essentially when you build crypto into
products essentially your supposed to be
-
using AES, and as a result Intel actually
put AES support into the processor itself.
-
So since Westmere there are special
instructions in the Intel processor to
-
help accelerate AES. And so I listed these
instructions here. They come in two pairs,
-
aesenc and aesenclast. And then, there's aesekygenassist. So, let me explain
-
what they do. So, aesenc essentially
implements one round of AES. Namely, apply
-
the three functions in the XOR with the
round key. And aesenclast basically
-
implements the last round of AES.
Remember, the last round didn't have the
-
mix columns phase. It only had the subs
bytes and shift rows. And so that's why
-
the aesenclast does. And the way you
call these instructions is using 128 byte
-
registers which correspond to the states
of AES. And so you would have one register
-
containing the states and one register
containing the current round key. And then
-
when you call AES on these two registers,
basically, they would run one round of AES
-
and place the result inside of this XMM
one state register. And as a result if you
-
wanted to implement the whole AES. All you
would do is, call aesenc nine times. Then
-
you would call aesenclast one time. These
ten instructions are basically the entire
-
implementation of AES. That's it. It's that
easy to implement AES on this hardware
-
and they claim because these operations
are now done inside the processor not
-
using external instructions that are
implemented in the processor. They claim
-
that they can get a fourteen X speedup
over say an implementation that's running
-
in the same hardware but implementing AES without these special instructions. So
-
this is quite a significant speed up and
the facts are now lots of. Products that
-
make use of these special instructions.
And let's just say that this is not
-
specific to Intel, if you're in
[inaudible], and they also implemented
-
exactly kinda similar instructions in
their bulldozer architecture and further
-
and future architectures. Okay, so let's
talk about the security of AES. I wanna
-
mention just two attacks here. Obviously,
AES has been studied quite a bit. But the
-
only two attacks on the full AES are the
following two. So, first of all, if you
-
wanted to do key recovery, the best
attack, basically, is only four times
-
faster than exhaustive search. Which mean
that instead of a hundred and. 28 bits
-
key, really you should be thinking of AES.
Is 126 bit key. Cause exhaustive search,
-
really it's gonna four times faster than
it should. Of course due to the 126, it's
-
still. More time than we have to compute,
and this really does not hurt the security
-
alias. The more significant attack is,
actually, on AES-256. It turns out there's a
-
weakness in the key expansion design of
aes which allows for, what's called a
-
related key attack. So, what's a related
key attack? Essentially, if you give me
-
about two to the 100 input output pairs
for aes, but from four related keys. So,
-
these are keys that are very closely
related, namely key number one. Is just
-
one bit flip of key #two, which is just
one flip, bit flip of key #three, which is
-
just one flip bit flip of key #four. These
are very closely related keys, if you like
-
your [inaudible] distances very short. But
if you do that, then, in fact, there is a
-
two to the 100 attack. Now you should say,
well, two to the 100 is still impractical.
-
This is still more time than we can
actually run today. But nevertheless, the
-
fact that it's so much better than an
exhaustive search attack, it's so much
-
better than two to the 56, is kind of a
limitation of the cipher. But generally,
-
it's not a significant limitation, because
it requires related keys. And so, in
-
practice, of course, you're supposed to be
choosing your keys at random, so that you
-
have no related keys in your system. And
as a result, this attack wouldn't apply.
-
But if you do have related keys, then
there's a problem. So this is the end of
-
the segment, and in the next segment we're
gonna talk about more provably secure
-
constructions for block ciphers.