The AES block cipher (14 min)

0:00 - 0:05

Over the years it became clear that DES
and triple DES are simply not designed for
0:05 - 0:09

modern hardware and are too slow. As a
result, NIS started a new process to
0:09 - 0:14

standardize in a new block cypher called
the Advanced Encryption Standard or AES
0:14 - 0:19

for short. Nis started it's effort in 1997
when it requested, proposals for a new
0:19 - 0:23

block cipher. It received fifteen
submissions a year later. And finally in
0:23 - 0:27

the year 2000, it adopted the cypher
called Rindall as the advanced encryption
0:27 - 0:32

standard. This was a cypher designed in
Belgium. We already said that it's block
0:32 - 0:38

size is 128 bits and it has three possible
key sizes. 128 bits, 192, and 256. Now,
0:38 - 0:45

the assumption is that the larger the key
size is, the more secure the block cipher
0:45 - 0:50

is as a pseudo random permutation. But
because it also has more rounds involved
0:50 - 0:56

in its operation. The slower the cipher
becomes. So the larger the key supposedly,
0:56 - 1:02

the more secure the cipher, but also the
slower it becomes. So for example, AES 128
1:02 - 1:07

is the fastest of these ciphers and AES
256 is the slowest. Now AES is built as
1:07 - 1:12

what?s called a substitution permutation
network. It's not a Faistl network.
1:12 - 1:16

Remember that in a Faistl network, half
the bit were unchanged from round to
1:16 - 1:21

round. In a substitution permutation
network all the bits are changed in every
1:21 - 1:25

round. And the network works as follows,
so here we have the first round of the
1:25 - 1:30

substitution permutation network, where
the first thing we do is we X or the
1:30 - 1:34

current state with the round key. In this
case the first round key. Then we go thru
1:34 - 1:39

a substitution layer, where blocks of Date
are replaced with other blocks based on
1:39 - 1:43

what the substitution table says. And then
we go through a permutation layer where
1:43 - 1:48

bits are permuted and shuffled around. And
then we do this again. We XR with an X
1:48 - 1:52

round key, we go thru a substitution
phase, and we're permute to dance around.
1:52 - 1:57

And so on, and so on, and so forth Until
we reach the final round where we x or
1:57 - 2:02

with the very last around key, and then
out comes the output. Now, and important
2:02 - 2:06

point about this design. Is that, in fact,
because of how it's built, every step in
2:06 - 2:11

this network needs to be reversible, so
that the whole thing is reversible. And so
2:11 - 2:16

the way we would, decrypt, essentially, is
we would take the output and simply apply
2:16 - 2:20

each step of the network in reverse order.
So we start with the permutation step, and
2:20 - 2:24

we have to make sure that step is
reversible. Then we look at the
2:24 - 2:28

substitution layer, and we have to make
sure this step is reversible. And this is
2:28 - 2:33

very different from DES. In DES, if you
remember, the substitution tables were not
2:33 - 2:37

reversible at all. In fact, they
map six bits to four bits. Whereas
2:37 - 2:41

here, everything has to be reversible,
otherwise it would be impossible to
2:41 - 2:45

decrypt. And of course the x or with the
round key is reversible as well. Okay? So
2:45 - 2:51

inversion of a substitution permutation
network is simply done by applying all of
2:51 - 2:56

the steps in the reverse order. So now
that we understand the generic
2:56 - 3:02

construction. Lets look at the specifics
of AS. So AS operates on a 128 bit block.
3:02 - 3:08

Which is sixteen bytes. So what we do with
AS is we write those sixteen bytes as a
3:08 - 3:13

four by four matrix. Each cell in the
matrix contains one byte. And then we
3:13 - 3:17

start with the first round so we X over
with the first round key and then apply a
3:17 - 3:21

certain function. That, includes
substitutions and permutations and other
3:21 - 3:26

operations on the state. And again, these
three functions that are applied here have
3:26 - 3:30

to be invertible so that in fact the
cypher can be decrypted. And then we xor
3:30 - 3:35

with the next round key and we do that
again. Again we apply the round function
3:35 - 3:39

and xor the round key. And we do that
again and again and again. We do it ten
3:39 - 3:43

times. Although interestingly in the last
round, the mix column step is actually
3:43 - 3:49

missing. And then finally, we XOR with the
last rounds key, and out comes the output.
3:49 - 3:54

Again, in every phase here, we always,
always, always keep this four by four
3:54 - 3:59

array. And so the output is also four by
four, which is sixteen bytes, which is 128
3:59 - 4:05

bits. Now the long key themselves of
course come from a sixteen byte AS key
4:05 - 4:11

using key expansion. So the key expansion
maps from a sixteen bytes AS key
4:11 - 4:17

into eleven keys, each one being sixteen
bytes. So these keys themselves are also a
4:17 - 4:22

four by four array that's XORed into the
current state. Okay, so that's the
4:22 - 4:26

schematic of how AES works. And the only
thing that's left to do is specify these
4:26 - 4:30

three functions, byte sub, shift row, and
mixed column. And those are fairly easy to
4:30 - 4:34

explain. So I'm just gonna give you the
high level description of what they are.
4:34 - 4:39

And, those interested in the details can
look it up online. So the way byte
4:39 - 4:45

substitution works, is literally it's one
S box containing 256 bytes. And
4:45 - 4:51

essentially, what it does is it applies
the S box to every byte in the current
4:51 - 4:56

states. So, let me explain what I mean by
that. So the current state is gonna be
4:56 - 5:02

this four by four, table. So here we have
the four by four table. And to each
5:02 - 5:07

element in this table, we apply the S box.
So let's call it the A table. And then
5:07 - 5:13

what we do is, essentially, for all four
by four entries, essentially, the next
5:13 - 5:19

step, Aij. Becomes the current step
evaluated at the look up table. So we use
5:19 - 5:25

the current cell as an entry, as an index,
into the look up table. And then the value
5:25 - 5:30

of the look up table is what's output.
Okay. So, that's the first step. The next
5:30 - 5:35

step that happens is a shift pro step,
which is basically just a permutation. So
5:35 - 5:40

essentially we kind of do a stick lick
shift on each one of the rows. So you can
5:40 - 5:45

see the second row is stick lick shifted
by one position. This third row is
5:45 - 5:50

[inaudible] shifted by two positions and
the third row is [inaudible] shifted by
5:50 - 5:54

three positions. And the last thing we do
is mix columns where literally we apply a
5:54 - 5:59

linear transformation to each one of these
columns. So there's a certain matrix that
5:59 - 6:03

multiplies each one of these columns and
it becomes, the next column. So these
6:03 - 6:07

linear transformation is applied
independently to each one of the columns.
6:07 - 6:12

Now, I should point out that, so far,
shift rows and mixed columns are very easy
6:12 - 6:17

to implement in code. And I should say
that the [inaudible] institution itself is
6:17 - 6:22

also easily computable, so that you can
actually write code that takes less than
6:22 - 6:28

256 bytes to write. And you can kind of
shrink the description of AES by literally
6:28 - 6:33

storing code that computes the table
rather than hardwiring the table into your
6:33 - 6:38

implementation. And in fact, this is kind
of a generic fact about AES, that if you
6:38 - 6:44

can have allowed no pre computation at
all, including computing the S box on the
6:44 - 6:49

fly. Then in fact you get a fairly small
implementation of AES, so it, it could fit
6:49 - 6:53

on very constrained environments where
there isn't enough room to hold,
6:53 - 6:57

complicated code. But of course, this will
be the slowest implementation, because
6:57 - 7:01

everything is computed now on the fly, and
as a result, the implementation,
7:01 - 7:05

obviously, is gonna be, slower than things
that were pre-computed. And then there is
7:05 - 7:09

this trade off. For example if you have a
lot of space, and you can support large
7:09 - 7:13

code. You can actually precompute quite a
bit of the three steps that I just
7:13 - 7:17

mentioned. In fact, there are multiple
options of pre-computation, you can build
7:17 - 7:21

a table that's only four kilobyte big. You
can build a table that is even longer,
7:21 - 7:24

maybe 24 kilobytes. So basically you will
have these big tables in your
7:24 - 7:28

implementation. But then your actual
performance is going to be really good,
7:28 - 7:32

because all your doing is just table
look-ups and XORs. You're not doing
7:32 - 7:35

any other complicated arithmetic. And as a
result, if you can do a lot of
7:35 - 7:39

pre-computation, these three steps here,
[inaudible] should. [inaudible] and mixed
7:39 - 7:43

columns can be converted just into a
number, a small number of table lookups
7:43 - 7:48

and some [inaudible]. All you can do is
just compute the S box, so now your
7:48 - 7:53

implementation would just have 256 bytes.
Hard coded The rest would just be code
7:53 - 7:57

that's actually computing these three
functions. The performance would be slower
7:57 - 8:01

than in the previous step but the code
footprint would also be smaller. So in
8:01 - 8:05

overall, there's this nice tradeoff
between code size and performance. So on
8:05 - 8:09

high-end machines, on high-end servers,
where you can afford to have a lot of
8:09 - 8:13

code, you can precompute and store these
big tables and get the best performance.
8:13 - 8:17

Whereas on low-end machines like eight
byte smart cards or think of like an eight
8:17 - 8:21

byte wristwatch, you would actually have a
relatively small implementation of AES.
8:21 - 8:26

But as a result of course it won't be so
fast. So here's an example that's a little
8:26 - 8:30

unusual, suppose you wanted to implement
AES in Javascript so you can send an AES
8:30 - 8:34

library to the browser and have the
browser actually do AES by itself. So in
8:34 - 8:39

this case what you'd like to, to is you'd
like to both shrink the code size, so that
8:39 - 8:43

on the network there's minimum traffic to
send a library over to the browser but, at
8:43 - 8:48

the same time, you'd like the browser
performance to be as fast as possible. And
8:48 - 8:52

so this is something that we did a while
ago essentially the idea is that the code
8:52 - 8:57

that actually gets send to the browser
doesn't have any pre computer table and as
8:57 - 9:01

a result is fairly small code. But then
the minute it lands on the browser, what
9:01 - 9:05

the browser will do is actually pre
compute all the tables. So in some sense
9:05 - 9:10

the code goes from just being small and
compact. It gets bloated with all these
9:10 - 9:14

precomputed tables. But those are stored
on the laptop, which presumably has a lot
9:14 - 9:17

of memory. And then once you have the
precomputed tables you actually encrypt
9:17 - 9:21

using them. And that's how you get the
best performance. Okay? So if you have to
9:21 - 9:25

stand in implementation AES over the
network, you can kind of get the best of
9:25 - 9:29

all worlds. Whereas, the code over the
network is small, but when it reaches the
9:29 - 9:33

target client, it can kind of inflate
itself. And then get the best performance
9:33 - 9:38

as it's doing encryption on the clients.
Now AES is such a popular block cipher,
9:38 - 9:43

now essentially when you build crypto into
products essentially your supposed to be
9:43 - 9:48

using AES, and as a result Intel actually
put AES support into the processor itself.
9:48 - 9:53

So since Westmere there are special
instructions in the Intel processor to
9:53 - 9:58

help accelerate AES. And so I listed these
instructions here. They come in two pairs,
9:58 - 10:03

aesenc and aesenclast. And then, there's aesekygenassist. So, let me explain
10:03 - 10:08

what they do. So, aesenc essentially
implements one round of AES. Namely, apply
10:08 - 10:13

the three functions in the XOR with the
round key. And aesenclast basically
10:13 - 10:17

implements the last round of AES.
Remember, the last round didn't have the
10:17 - 10:22

mix columns phase. It only had the subs
bytes and shift rows. And so that's why
10:22 - 10:27

the aesenclast does. And the way you
call these instructions is using 128 byte
10:27 - 10:32

registers which correspond to the states
of AES. And so you would have one register
10:32 - 10:37

containing the states and one register
containing the current round key. And then
10:37 - 10:42

when you call AES on these two registers,
basically, they would run one round of AES
10:42 - 10:48

and place the result inside of this XMM
one state register. And as a result if you
10:48 - 10:53

wanted to implement the whole AES. All you
would do is, call aesenc nine times. Then
10:53 - 10:58

you would call aesenclast one time. These
ten instructions are basically the entire
10:58 - 11:03

implementation of AES. That's it. It's that
easy to implement AES on this hardware
11:03 - 11:07

and they claim because these operations
are now done inside the processor not
11:07 - 11:11

using external instructions that are
implemented in the processor. They claim
11:11 - 11:16

that they can get a fourteen X speedup
over say an implementation that's running
11:16 - 11:20

in the same hardware but implementing AES without these special instructions. So
11:20 - 11:24

this is quite a significant speed up and
the facts are now lots of. Products that
11:24 - 11:28

make use of these special instructions.
And let's just say that this is not
11:28 - 11:32

specific to Intel, if you're in
[inaudible], and they also implemented
11:32 - 11:36

exactly kinda similar instructions in
their bulldozer architecture and further
11:36 - 11:40

and future architectures. Okay, so let's
talk about the security of AES. I wanna
11:40 - 11:44

mention just two attacks here. Obviously,
AES has been studied quite a bit. But the
11:44 - 11:49

only two attacks on the full AES are the
following two. So, first of all, if you
11:49 - 11:53

wanted to do key recovery, the best
attack, basically, is only four times
11:53 - 11:57

faster than exhaustive search. Which mean
that instead of a hundred and. 28 bits
11:57 - 12:03

key, really you should be thinking of AES.
Is 126 bit key. Cause exhaustive search,
12:03 - 12:08

really it's gonna four times faster than
it should. Of course due to the 126, it's
12:08 - 12:13

still. More time than we have to compute,
and this really does not hurt the security
12:13 - 12:17

alias. The more significant attack is,
actually, on AES-256. It turns out there's a
12:17 - 12:23

weakness in the key expansion design of
aes which allows for, what's called a
12:23 - 12:28

related key attack. So, what's a related
key attack? Essentially, if you give me
12:28 - 12:33

about two to the 100 input output pairs
for aes, but from four related keys. So,
12:33 - 12:38

these are keys that are very closely
related, namely key number one. Is just
12:38 - 12:42

one bit flip of key #two, which is just
one flip, bit flip of key #three, which is
12:42 - 12:47

just one flip bit flip of key #four. These
are very closely related keys, if you like
12:47 - 12:52

your [inaudible] distances very short. But
if you do that, then, in fact, there is a
12:52 - 12:57

two to the 100 attack. Now you should say,
well, two to the 100 is still impractical.
12:57 - 13:01

This is still more time than we can
actually run today. But nevertheless, the
13:01 - 13:05

fact that it's so much better than an
exhaustive search attack, it's so much
13:05 - 13:09

better than two to the 56, is kind of a
limitation of the cipher. But generally,
13:09 - 13:14

it's not a significant limitation, because
it requires related keys. And so, in
13:14 - 13:18

practice, of course, you're supposed to be
choosing your keys at random, so that you
13:18 - 13:22

have no related keys in your system. And
as a result, this attack wouldn't apply.
13:22 - 13:26

But if you do have related keys, then
there's a problem. So this is the end of
13:26 - 13:30

the segment, and in the next segment we're
gonna talk about more provably secure
13:30 - 13:32

constructions for block ciphers.

Title:: The AES block cipher (14 min)
Video Language:: English

	abd.evg edited English subtitles for The AES block cipher (14 min)
	abd.evg edited English subtitles for The AES block cipher (14 min)
	abd.evg edited English subtitles for The AES block cipher (14 min)
	joanllosas edited English subtitles for The AES block cipher (14 min)
	joanllosas edited English subtitles for The AES block cipher (14 min)
	joanllosas edited English subtitles for The AES block cipher (14 min)
	stanford-bot edited English subtitles for The AES block cipher (14 min)
	stanford-bot edited English subtitles for The AES block cipher (14 min)

Show all

English subtitles

Revisions

Revision 3

abd.evg

The AES block cipher (14 min)

Revisions

Our website uses cookies

Operating cookies (Required)