PKCS 1 (23 min)

0:00 - 0:05

In this segment I wanna show you how RSA
is used in practice and in particular, I
0:05 - 0:09

wanna tell you about a standard called a
Public Key Cryptography standard number
0:09 - 0:15

one PKCS one. So I've already told you a
couple of times that you should never use
0:15 - 0:19

what's called textbook RSA. You should
never directly encrypt the message using
0:19 - 0:24

RSA because that's insecure. You always
have to do something to the message before
0:24 - 0:28

you actually apply the RSA function. We
saw the ISO standard in the previous
0:28 - 0:33

segment where what we did is we generated
a random x, encrypted x using RSA, and
0:33 - 0:37

then from this x we derived a symmetric
encryption key. But that's actually not how
0:37 - 0:42

RSA is used in practice. In practice
things worked a little differently and
0:42 - 0:46

typically what happens is the system
generates a symmetric encryption key and
0:46 - 0:51

then RSA is asked to encrypt the given
symmetric encryption key rather than
0:51 - 0:55

generating the symmetric key as part of
[our same encryption or RSA encryption?]. So in practice as I
0:55 - 1:01

say, what happens the RSA system is given
this input a symmetric encryption key to
1:01 - 1:06

encrypt for example this could be a, an AES
key that would be like 128 bits and then
1:06 - 1:12

the way this key is actually encrypted is
first we take these 128 bits and we expand
1:12 - 1:17

them into the full modulo size. For
example, this would be 2048 bits, and then
1:17 - 1:22

we apply the RSA function. And so the
question is, how should this preprocessing
1:22 - 1:26

be done? How should we expand the 128 bit
AES key that was given to us into a 2048
1:26 - 1:31

bit value that can then be applied with
RSA. And then furthermore the question is,
1:31 - 1:35

how do we argue that the resulting system
is secure? So in all the way of doing it
1:35 - 1:40

which is actually very widely deployed in
practice is what's called PKCS1 version
1:40 - 1:44

1.5, Public Key Cryptography Standard,
that's what PKCS stands for. So I wanna
1:44 - 1:49

show you how this mechanism works and in
particular, I'll show you what's called
1:49 - 1:54

PKCS1 Mode 2. Mode 2 denotes
encryption, mode 1 denotes signatures so
1:54 - 1:58

here we're gonna just focus on the
encryption. So the way PKCS1 works is
1:58 - 2:03

as follows, you take your message, this
would be the 128 bit AES key for example,
2:03 - 2:07

and you put it as a least significant bit
of the value that you're creating. The
2:07 - 2:13

next thing you do is you append sixteen
bits of one to it, you know [foreign].
2:13 - 2:18

This is sixteen bits of one. And then the
next you do is you append the random pad
2:18 - 2:23

that actually does not contain FF
anywhere inside the random pad. So this
2:23 - 2:28

is basically something like on the order
of what is it, 1900 random bits except
2:28 - 2:33

that there are on FF's inside those random
bits and finally at the very top, you put
2:33 - 2:39

the number 02. This indicates that this
plain text has been encoded using PKCS1
2:39 - 2:43

mode 2. And then this whole value here,
this whole thing that we just created.
2:43 - 2:49

This is just 2048 bit string is the thing
that gets fed into the RSA function and is
2:49 - 2:54

raised to the power of e modulo N. And the
resulting thing is PKCS1 ciphertext. Now
2:54 - 2:59

the decrypter of course is going to invert
the RSA function, He's gonna get back this
2:59 - 3:03

block, He's gonna look at the most
significant bits and he's gonna say
3:03 - 3:08

there's a 02 in there that means that this
PKCS one formatted. Since it's PKCS one
3:08 - 3:12

formatted, it's gonna remove those 02.
It's gonna remove all the random pad up
3:12 - 3:17

until the FF. And then anything
that stays beyond that is the original
3:17 - 3:22

message which is then used you know to say
decrypt the actual body of the ciphertext.
3:22 - 3:28

And as I said, this mechanism is extremely
widely deployed, For example, it's used in
3:28 - 3:33

HTTPS. Now interestingly, this PKCS1
version 1.5 was designed in the late 80's.
3:33 - 3:38

There was really no security proof to
argue that this mechanism is in fact
3:38 - 3:43

secure. And as it turns out, it is very
common when there is no security proof, it
3:43 - 3:48

turns out that actually things break and
there's a very, very elegant attack due to
3:48 - 3:53

Bleichenbacher back in 1998, Daniel
Bleichenbacher which shows how to attack
3:53 - 3:58

PKCS1 when it's used for example inside of
HTTPS. So let's see how the attack works.
3:58 - 4:03

So the idea is this, supposed the attacker
intercepted a certain ciphertext so this
4:03 - 4:08

is PKCS1 ciphertext so the point is
it's encoded using PKCS1 and then the
4:08 - 4:13

result is fed into the RSA function. And
we call the ciphertext the output of the
4:13 - 4:17

RSA function. The attacker wants to
basically decrypt the ciphertext, So let
4:17 - 4:21

me show you abstractly what the attacker
is gonna do. We're gonna just simplify SSL
4:21 - 4:25

and just say that the attacker can
basically send the ciphertext directly to
4:25 - 4:29

the web server. The web server is gonna
try and decrypt the ciphertext using its
4:29 - 4:33

secret key. And then what is it gonna do?
Well, the first thing it does after the
4:33 - 4:37

encryption, well, it's gonna ask: is the
decryption of the ciphertext PKCS1
4:37 - 4:41

encoded? In other words, it's gonna look
at the most significant bits and ask: is
4:41 - 4:45

this 02 in the most significant positions? If
they are, it's gonna continue properly
4:45 - 4:49

decrypting and then continue with the
protocol and if there is no 02 in those
4:49 - 4:53

positions, it's gonna announce an error.
So if the most significant bits of the
4:53 - 4:58

plaintext are 02, it's gonna continue
with the protocol as expected, if the most
4:58 - 5:02

significant bits are not 02 is gonna send
back an error message and tell the
5:02 - 5:06

attacker, hey, you sent in invalid
ciphertext. Now the amazing thing is that,
5:06 - 5:10

this lets the attacker test whether the
sixteen most significant bits of the
5:10 - 5:15

decryption of the given ciphertext are 02
or not. In other words, the attacker can
5:15 - 5:20

submit whatever ciphertext he wants to the
web server. The web server will invert the
5:20 - 5:25

RSA function and then tell the attacker
whether the inversion started with 02 or
5:25 - 5:30

not. So in some sense gives the attacker an
oracle that lets him test whether the
5:30 - 5:35

inversion of any ciphertext begins with 02
or not. And it turns out that's actually
5:35 - 5:40

enough to completely decrypt any
ciphertext the attacker wants, Just add
5:40 - 5:45

little bit of information leakage just by
properties of RSA lest the attacker
5:45 - 5:50

completely decrypt a given ciphertext.
Also here's what the attacker is gonna do,
5:50 - 5:54

well, he has a ciphertext that he wants to
decrypt so what he'll do is he'll take a
5:54 - 5:58

cyphertext and of course feed that
directly into the web server and ask does
5:58 - 6:02

it begin with the 02 or not. The next
thing he's gonna do is he's gonna chose a
6:02 - 6:06

random value r and he's gonna build a new
ciphertext c prime. Which is (r to the e) *
6:06 - 6:12

c modulo N. Now what does that do? If we
pull the r into the RSA function, really
6:12 - 6:18

what we just did is we multiply the RSA
plaintext, you know the PKCS1 including
6:18 - 6:23

in m, we multiply that by r and that,
that whole thing gets raised to the power
6:23 - 6:28

of e. Okay. So that's the effect of
multiplying c by (r to the e). It multiplies
6:28 - 6:32

the plaintext by r, a value that the
attacker controls. And then he's gonna
6:32 - 6:37

send c prime to the web server and the web
server is gonna say yes, this starts with
6:37 - 6:42

02 or no, this doesn't start with 02. So
now we can abstract this problem to
6:42 - 6:48

something more general which you can think
of as the following so I have this number
6:48 - 6:53

x in my head. This is the number I'm
trying to get The PKCS1 encoding on m. I'm
6:53 - 6:58

thinking of this number x and then what I
let you do is for r, which way r of your
6:58 - 7:04

choice I will tell you whether r times x mod n
starts with 02 or not. And it turns out by
7:04 - 7:09

asking enough questions it turns out
either by million questions of this type,
7:09 - 7:15

you can basically recover all of x. Just
by learning whether r times x begins with 02
7:15 - 7:21

or not by asking enough questions, you can
actually recover x. So in reality what
7:21 - 7:25

this means is that the attacker can
capture a given ciphertext. This maybe
7:25 - 7:29

corresponds to a ciphertext where the user
enters the credit card number or a
7:29 - 7:33

password, and now the attacker wants to
decrypt the ciphertext. What he would do
7:33 - 7:37

is he would send about a million
ciphertext like this, the web server for
7:37 - 7:42

each ciphertext is gonna respond whether
the plaintext begins with 02 or not and at
7:42 - 7:46

the end of the attack, the attacker just
blocks away with the, decryption of the
7:46 - 7:51

ciphertext c. So this might seem a
little magical to you, how are you able to
7:51 - 7:55

just from learning whether the most
significant bits are 02 or not, how you're
7:55 - 7:59

able to recover the entire plain text. So
let me show you a simple example of this.
7:59 - 8:04

I'll call those bab y Bleichenbacher just
to kinda get the idea across on how this
8:04 - 8:08

attack might work. So imagine that the
attacker is able to send the ciphertext c,
8:08 - 8:13

the web server is gonna use the secret key
to decrypt but let's suppose that instead
8:13 - 8:17

of checking for 02 or not, all the web
server does is he asked is the most
8:17 - 8:21

significant bit one or not. If the most
significant bit is one, the web server
8:21 - 8:26

says yes. If the most significant bit is
not one, the web server is no. Moreover,
8:26 - 8:31

to simplify things, let's assume that the
RSA module is n is a power of two. So n =
8:31 - 8:36

two^n. Of course, this is not a valid RSA
modulus. An RSA modulus used to be a
8:36 - 8:40

product of two distinct primes. But again,
just to keep things simple, let's pretend
8:40 - 8:45

for a second that n is actually a power of
two. So now you realize that by sending
8:45 - 8:50

the ciphertext c over to the web server,
the adversary just learn the most
8:50 - 8:55

significant bits of the plaintext x.
Basically, the servers behavior completely
8:55 - 8:59

reveals what the most significant is. Now
what the attacker can do is he can
8:59 - 9:04

multiply the ciphertext by two to the e.
Now multiplying by two to the e has the
9:04 - 9:09

effect of multiplying the plaintext x. By
two simply multiplying x by two and
9:09 - 9:15

because we're working mod two to the N,
multiplying by two basically means shift
9:15 - 9:21

left, okay. So now when we shift left in
fact we get to learn the most significant
9:21 - 9:27

bits of 2x which is really the second most
significant bit of x, okay. So, again the
9:27 - 9:32

most significant bit of 2x basically we
shift x to the left. And we reduce modulo
9:32 - 9:38

n. So now, the most significant bit of 2x
modulo n is basically the second most
9:38 - 9:43

significant bit of x. So now we learned
another bit of x. And now we're gonna
9:43 - 9:48

repeat this again. We're gonna query a
four to the e * c, That corresponds to
9:48 - 9:54

querying of 4x to the power e. Querying of
4x basically reveals the most significant
9:54 - 9:59

bit of 4x mod n. 4x * four corresponds the
shifting by two bits. Which mean now we
9:59 - 10:04

get to learn the third most significant
bit of x. And when we repeat this again,
10:04 - 10:08

and again, and again for different multip
les of c and you can see that after just a
10:08 - 10:12

few queries, about a 1,000 or 2,000
queries, we'll recover all of x. The
10:12 - 10:16

reason Bleichenbacher needed about a
million queries is because he wasn't
10:16 - 10:20

testing for one, He was actually testing
for 02 or not 02 and that basically means
10:20 - 10:24

that he needs instead of 2,000 queries he
needs actually a million queries but
10:24 - 10:29

that's still enough to recover all of the
plaintext text. Okay? So I hope this is
10:29 - 10:34

explains why this attack is possible. Why
just this little bit of information about
10:34 - 10:39

the most significant bits of the RSA
inverse is enough to completely decrypt
10:39 - 10:46

RSA. So the bottom line here is that PKCS1 as implemented in web server as up
10:46 - 10:50

until this attack was discovered was
basically insecure because the attacker
10:50 - 10:55

could essentially decrypt a ciphertext he
wants just by issuing enough queries to
10:55 - 11:00

the web server. So how do we defend
against this attack? Well, so the SSL
11:00 - 11:05

community basically wanted the minimum
change in their code that would prevent
11:05 - 11:10

this attack from working and so if you
look at the RFC, what they propose is the
11:10 - 11:15

following. Well, there's a lot of text
here but basically what they propose is to
11:15 - 11:19

say that if after you apply the RSA
decryption, you get a plaintext that's not
11:19 - 11:24

PKCS1 encoded. In other words it doesn't
start with 02. What you're supposed to do
11:24 - 11:29

is basically choose some random string r.
And just pretend that the plaintext is
11:29 - 11:33

just a random string r and continuous as
nothing happened and of course the
11:33 - 11:38

protocol will fail later on. So to be
concrete you see if the PKCS one encoding
11:38 - 11:43

is not correct, what you would do is you
would just say the premaster secret is
11:43 - 11:47

this random string or that we just picked
up off thin air and let's continue the
11:47 - 11:51

protocol and of course the session set up
will fail because the client and the
11:51 - 11:55

server will end up agreeing on different
keys and that will cause the session to
11:55 - 11:59

terminate. So we actually don't tell the
attacker whether the plaintext begins with
11:59 - 12:05

02 or not. All we do is just pretend that
the plaintext is some random value. So
12:05 - 12:10

this was a minor code change to many web
servers and it was fairly easy to get
12:10 - 12:15

deployed and in fact most web servers out
there today implement this version of
12:15 - 12:19

PKCS1. However, this kind of raises the
question of whether PKCS1 should be
12:19 - 12:24

changed all together so that we actually
are able to prove chosen ciphertext
12:24 - 12:28

security. And that brings us to a
different way of doing encryption using
12:28 - 12:33

RSA which is called Optimal Asymmetric
Encryption Padding, OAEP. And in fact the
12:33 - 12:40

PKCS standard was updated and PKCS1 version
2.0 actually has support for OAEP which is
12:40 - 12:49

a better way of doing RSA encryption. So
let me explain how OAEP works. So OAEP is
12:49 - 12:55

due to the Bellare and Rogaway back in 1994. And
the way OAEP works is as follows. So you
12:55 - 13:01

take your message that you wanna encrypt
this for example could be the 128 bits AES
13:01 - 13:07

key. And then the first thing you do is
you append a short pad to it. So in this
13:07 - 13:11

case, you put a 01 in the beginning and
then you add a whole bunch of zeroes. How
13:11 - 13:16

many zeroes is actually depends on the
standard but you know that's supposed like
13:16 - 13:21

they're 128 zeroes in here. And then you
also choose a random value so that this
13:21 - 13:26

whole string is as big as your RSA modulus
so let's just say this is 2047 bits. Now
13:26 - 13:30

before you apply the RSA function, you
first of all take the random value that
13:30 - 13:34

you chose and you feed it into the hash
function. This hash function produces a
13:34 - 13:39

value that's as big as the left hand side
of your encoding. You XOR the outputs.
13:39 - 13:43

You feed the result into another hash
function which we call a g. You XOR
13:43 - 13:47

that with a random value. And then
finally, you get these two values that you
13:47 - 13:51

concatenate together. Okay, So, you
concatenate either left side and the
13:51 - 13:57

right side that gives you something that
says is 2047 bits long and that's the thing
13:57 - 14:03

that you apply the RSA function to. And the
result is the RSA encryption. Now there's
14:03 - 14:08

a theory that was proved in 2001, due to
Fujisaki, Okamoto, Pointcheval, and Stern,
14:08 - 14:13

that shows that in fact if all you do is
you, you assume that the RSA function is a
14:13 - 14:17

Trapdoor permutation, a secure Trapdoor
permut ation, but in fact this mode of
14:17 - 14:22

encrypting using RSA is in fact chosen
ciphertext secure but we have to
14:22 - 14:27

assume that the functions h and g are kind
of ideal hash functions as I said these
14:27 - 14:31

are called random oracles. Basically, we
assume that h and g are just random
14:31 - 14:36

functions from their domains to their
range. In practice of course when OAEP is
14:36 - 14:41

implemented, people just use SHA-256
say for h and g. Why is this called
14:41 - 14:46

Optimal Asymmetric Encryption Padding?
Why is this o? Why does this stands for
14:46 - 14:50

optimal? Well, the reason is if you look
at the ciphertext, you'll notice that the
14:50 - 14:55

ciphertext is basically as short as it can
be. The ciphertext is exactly the length
14:55 - 14:59

of one RSA output, there are no trailing
values that are appended to the ciphertext
14:59 - 15:04

whereas for example, the ISO standard
if you remember even if you had to encrypt
15:04 - 15:09

a very short message, what you would have
to do is you would have to encrypt x using
15:09 - 15:13

RSA and then append to that x, the
encryption with the short message under
15:13 - 15:17

the symmetric encryption system. So even
if you have just to encrypt a 128 bit AES
15:17 - 15:21

key, with the ISO standard you would
get one RSA output plus a symmetric
15:21 - 15:27

cipher whereas with OAEP, you just get one
RSA output and nothing else so in that
15:27 - 15:31

sense, it's optimal, Optimal in terms of
the length of the ciphertext.
15:31 - 15:36

Interestingly, this theorem here really
rely on properties of RSA. And in fact,
15:36 - 15:41

the theorem is known to be false if
you use a general trapdoor permutation.
15:41 - 15:46

Some other permutation doesn't have the
algebraic properties of RSA. So that left
15:46 - 15:50

this question of if we have a general
trapdoor permutation, what is the correct
15:50 - 15:55

way to do OAEP? And it turns out, there's
a minor modification to OAEP which makes
15:55 - 16:00

the result more general. This is a result
due to Shoup back in 2001. Just shows that
16:00 - 16:05

if you give me a general trapdoor
permutation f. It turns out if you replace
16:05 - 16:10

the fixed pad that's using OAEP by a hash
function, this hash function w, which
16:10 - 16:14

happens to be a hash function of the
message m and the randomness r that
16:14 - 16:19

you're encrypting with, and then during
decryption, you actually check that this
16:19 - 16:24

hash function value is correct. So when
you decrypt, you actually check the w of
16:24 - 16:29

mr is really what's written in this
position in the plaintext. If you do that
16:29 - 16:34

then basically this OAEP called OAEP+.
Is in fact CCS secure, Chosen Ciphertext
16:34 - 16:38

Secure for any trapdoor permutation. You
don't have to rely on a specific
16:38 - 16:43

properties of RSA. There's another result
called Simple Asymmetric Encryption
16:43 - 16:47

Padding, SAEP+ which basically says
that if you are gonna rely on properties
16:47 - 16:52

of RSA, then in particular way with RSA
when the public exponent is equal to
16:52 - 16:56

3, it turns out you actually don't
need the second stage of encryption. The
16:56 - 17:01

simple padding scheme here again using the
function w is actually already enough to
17:01 - 17:06

guarantee chosen ciphertext security. So
these are variants OAEP but they're not
17:06 - 17:12

really used. I just wanted to mention and
to let you know they exist. These are not
17:12 - 17:16

really used mainly OAEP has been
standardized is what's used. Although
17:16 - 17:21

again in reality, the most common
application of RSA for public encryption
17:21 - 17:26

is in fact this PKCS1 that's
standardized in the HTTPS RFC. So just to
17:26 - 17:31

make sure it is clear in your mind how
decryption actually works, let me ask you,
17:31 - 17:36

how would you decrypt an SAEP ciphertext
ct. So here, you're given the ciphertext
17:36 - 17:41

ct here and the question is which of these
three methods is the correct way of
17:41 - 17:47

decrypting the ciphertext. So the correct
answer is the first one and let's see why.
17:47 - 17:52

Well, given the ciphertext the first thing
what we need to do is apply the RSA
17:52 - 17:57

inverse functions, the ciphertext and that
will give us the RSA plaintext which
17:57 - 18:02

happens to be in this case x and r. So we
get this x and r here. The next thing we
18:02 - 18:08

need to do is we need to hash r using the
function h and XOR the result with x and
18:08 - 18:12

that will give us m and wm, r. And the
last we need to do is to make sure that
18:12 - 18:17

the pad W(m,r) is correct so we check that
in fact w is equal to W(m,r) and if so we
18:17 - 18:23

output m and if not, we output bottom to
say that the ciphertext is invalid and
18:23 - 18:27

the decryption algorithm rejects it. And
by the way I'd like to emphasize that this
18:27 - 18:32

checking of the pad during decryption is
crucial in all of the schemes that we
18:32 - 18:37

just saw. So for example, in both OAEP+ and SAEP+, it's doing decryption.
18:37 - 18:43

It's very important to check that the pad
is in fact correct so that the value of w
18:43 - 18:49

that you get here during the encryption
really is the hash under the capital W of
18:49 - 18:54

m and r and similarly on OAEP, it's very
important to check it during decryption.
18:54 - 19:00

In fact, the value of the pad is the
constant 010000000. And of course if it
19:00 - 19:05

happens to be not 01000 then you output
bottom and you say the ciphertext is
19:05 - 19:09

invalid. The last thing I wanna point out
is that actually implementing OAEP can be
19:09 - 19:13

quite tricky and let's see why. So
supposed you have, you write an OAEP
19:13 - 19:17

decryption routine that takes the
ciphertext as input. The first thing you
19:17 - 19:22

would do is you would apply the RSA
inverse function to the ciphertext and you
19:22 - 19:28

would say well, you expect to get an n bit
value out, you know 2047 bits in the case
19:28 - 19:33

of 2048 bit RSA modulus and if you get
something that's bigger than two to the
19:33 - 19:37

2047, you say reject. We say error = one
and we go to Exit. The next thing we're
19:37 - 19:42

gonna do is we're gonna check whether the
pad is in fact the correct pad and again
19:42 - 19:47

if the pad is not the correct pad, then
again, we're gonna reject and go to Exit.
19:47 - 19:52

The problem with this implementation is
well by now I hope you kind of guessed it
19:52 - 19:56

is it's vulnerable to a timing attack,
right. So essentially by leaking timing
19:56 - 20:00

information the attacker can now figure
out what caused the error. Was it, that
20:00 - 20:05

was there an error the RSA decryption
happened to be too big or was there an
20:05 - 20:09

error because the pad happened to be too
large? And if the attacker can this
20:09 - 20:13

distinguish these two errors say by
timing. Then it turns out similar to the
20:13 - 20:18

Bleichenbacher attack, it's possible to
completely decrypt any ciphertext of your
20:18 - 20:22

choice. Just at very, very, very small
leak of information would completely allow
20:22 - 20:28

the attacker to decrypt completely any
ciphertext that he wants to. So this shows
20:28 - 20:33

that if you, even if you implement the
mathematics of OAEP decryption correctly,
20:33 - 20:37

you can very easily mess up and open
yourself up to a timing attack which would
20:37 - 20:41

make your implementation completely
insecure. So as usual, the lesson is,
20:41 - 20:45

don't implement crypto yourself in
particular, RSA OAEP is particularly
20:45 - 20:50

dangerous to implement yourself. Just use one
of the standard libraries for example,
20:50 - 20:54

OpenSSL has an OAEP implementation and of
course what they do is very careful to
20:54 - 20:59

make sure that the running time of OAEP
decrypt is always the same no matter what
20:59 - 21:03

causes the error. Okay, So let's stop here
and then in the next segment we'll talk
21:03 - 21:07

about the security of the RSA trapdoor
permutation.

Title:: PKCS 1 (23 min)
Video Language:: English

	rodolfojcj edited English subtitles for PKCS 1 (23 min)
	rodolfojcj edited English subtitles for PKCS 1 (23 min)
	rodolfojcj edited English subtitles for PKCS 1 (23 min)
	stanford-bot edited English subtitles for PKCS 1 (23 min)
	stanford-bot edited English subtitles for PKCS 1 (23 min)
	stanford-bot added a translation

English subtitles

Revisions

Revision 2

rodolfojcj

PKCS 1 (23 min)

Revisions

Our website uses cookies

Operating cookies (Required)