0:00

In this segment, we're gonna construct authenticated encryption systems. Since we

already have CPA secured encryption, and we have secure MACs, the natural question

is whether we can combine the two somehow, in order to get authenticated encryption.

And if, that's exactly what we're gonna do in this segment. Authenticated encryption

was introduced in the year 2000, in two independent papers that I point to at the

end of this module. But before then, many crytpolibraries provided an API that

separately supported CPA secure encryption, and MAC-ing. So there was one

function for implementing CPA secure encryption. For example, CBC with a random

IV. And another function for implementing a MAC. And then every developer that

wanted to implement encryption, had to, himself, call separately the CPA secure

encryption scheme and the MAC scheme. In particular, every developer had to invent

his own way of combining encryption and MAC-ing to provide some sort of

authenticated encryption. But since the goals of combining encryption and MAC-ing

wasn't well understood since authenticated encryption hasn't yet been defined, it

wasn't really clear which combinations of encryption and MAC-ing are correct and

which aren't. And so, every project as I said had to invent its own combination.

And in fact, not all combinations were correct. And I can tell you that the most

common mistake in software projects were basically incorrectly combining the

encryption and integrity mechanisms. So hopefully, by the end of this module, you

will know how to combine them correctly, and you won't be making these mistakes

yourself. So let's look at some combinations of CPA secure encryption and

MAC, that were introduced by different projects. So here are three examples. So,

first of all, in all three examples, there's a separate key for encryption, and

a separate key for MACing. These two keys are independent of one another, and

both are generated at session setup time. And we're gonna see how to generate these

two keys later on in the course. So the first example is the SSL protocol. So the

way SSL combines encryption and MAC in the hope of achieving authenticated encryption

is the following. Basically you take the plain text, m, and then you compute a MAC

on the plain text, m. So you use your MAC key, kI, to compute tag for this message

m. And then you can concatenate the tag to the message and then you encrypt the

concatenation of the message and the tag and what comes out is the actual final cipher text.

So that's option number one. The second option is what IPsec does. So

here, you take the message. The first thing you do is you encrypt the message.

And then, you compute a tag on the resulting cipher text. So you notice the

tag itself is computed on the resulting cipher text. A third option is what the

SSH protocol does. So here, the SSH takes the message, and encrypts it using a CPA

secure encryption scheme. And then, to it, it concatenates a tag of the message. The

difference between IPsec and SSH, is that in IPsec, the tag is computed over the

cipher text, whereas, in SSH, the tag is computed over the message. And so these

are three completely different ways of combining encryption and MAC. And the

question is, which one of these is secure? So, I will let you think about this for a

second, and then when we continue we'll do the analysis together.

3:13

Okay. So let's start with the SSH method. So in the SSH method you notice that the tag

is computed on the message and then concatenated in the clear to the cipher text.

Now this is actually quite a problem because MACs themselves are not designed

to provide confidentiality. MACs are only designed for integrity. And in fact, there's

nothing wrong with a MAC that as part of the tag outputs a few bits of the plain

text. Outputs a few bits of the message M. That would be a perfectly fine tag. And yet if

we did that, that would completely break CPA security here, because some bits of

the message are leaked in the cipher text. And so the SSH approach, even though the

specifics of SSH are fine and the protocol itself is not compromised by

this specific combination, generally it's advisable not to use this approach. Simply

because the output of the MAC signing algorithm might leak bits of the message. So

now let's look at SSL and IPsec. As it turns out, the recommended method actually

is the IPsec method. Because it turns out no matter what CPA secure system and MAC

key you use the combination is always gonna provide authenticated encryption.

Now let me very, very briefly explain why. Basically what happens is once we encrypt

the message well the message contents now is hidden inside the cipher text and now

when we compute a tag of the cipher text basically we're locking, this tag locks

the cipher text and makes sure no one can produce a different cipher text that would

look valid. And as a result this approach ensures that any modifications to the

cipher text will be detected by the decrypter simply because the MAC isn't

gonna verify. As it turns out, for the SSL approach, there actually are kind of

pathological examples, where you combine CPA secure encryption system with a secure

MAC. And the result is vulnerable to a chosen cipher text attack, so that it does

not actually provide authenticated encryption. And basically, the reason that

could happen, is that there's some sort of a bad interaction between the encryption

scheme and the MAC algorithm. Such that, in fact, there will be a chosen cipher

text attack. So if you're designing a new project the recommendation now is to

always use encrypt then MAC because that is secure no matter which CPA secure

encryption and secure MAC algorithm you're combining. Now, just to set the

terminology, the SSL method is sometimes called MAC-then-encrypt. And the

IPsec method is called encrypt-then-MAC. The SSH method even though you're

not supposed to use it, is called encrypt-and-MAC. Okay, so I'll often refer to

encrypt-then-MAC, and MAC-then-encrypt to differentiate SSL and IPsec. Okay, so

just to repeat what I've just said. The IPsec method encrypt-then-MAC always

provides authenticated encryption. If you start from a CPA secure cipher and a secure MAC

you will always get authenticated encryption. As I said, MAC-then-encrypt in

fact, there are pathological cases where the result is vulnerable to CCA attacks and

therefore does not provide authenticated encryption. However, the story's a little

bit more interesting than that, in that, it turns out, if you're actually using

randomized counter mode or randomized CBC, then it turns out, for those particular

CPA secure encryption schemes, MAC-then-encrypt actually does provide authenticated

encryption and therefore it is secure. In fact, there's even a more interesting

twist here in that if you're using randomized counter mode. Then, it's enough

that your MAC algorithm just be one time secure. It doesn't have to be a fully

secure MAC. It just has to be secure when a key is used to encrypt a single message,

okay? And when we talked about message integrity, we saw that there are actually

much faster MACs that are one time secure than MACs that are fully secure. As a

result, if you're using randomized counter mode MAC-then-encrypt could actually

result in a more efficient encryption mechanism. However, I'm going to repeat

this again. The recommendation is to use encrypt-then-MAC and we're going to see a

number of attacks on systems that didn't use encrypt-then-MAC. And so just to make

sure things are secure without you having to think too hard about this. Again, I am

going to recommend that you always use encrypt-then-MAC. Now, once the concept of

authenticated encryption became more popular, a number of standardized

approaches for combining encryption and MAC turned up. And those were even

standardized by the National Institute of Standards. So I'm just gonna mention three

of these standards. Two of these were standardized by NIST. And these are

called Galois counter mode and CBC counter mode. And so let me explain what they do.

Galois counter mode basically uses counter mode encryption, so a randomized counter

mode with a Carter-Wegman MAC, so a very fact Carter-Wegman MAC. And the way the

Carter-Wegman MAC works in GCM is it's basically a hash function of the message

that's being MACed. And then the result is encrypted using a PRF. Now this hash

function in GCM is already quite fast to the point where the bulk of the running

time of GCM is dominated by the counter mode encryption and it's even made more so

in that Intel introduces a special instruction PCLMULQDQ specifically

designed for the purpose of making the hash function in GCM run as fast as possible.

Now CCM counter mode is another NIST standard. It uses a CBC MAC and

then counter mode encryption. So this mechanism, you know, this uses MAC, then

encrypt, like SSL does. So this is actually not the recommended way of doing

things, but because counter mode encryption is used. This is actually a

perfectly fine encryption mechanism. One thing that I'd like to point out about

CCM, is that everything is based on AES. You notice, it's using AES for the CBC

MAC, and it's using AES for the counter mode encryption. And as a result, CCM can

be implemented with relatively little code. Cause all you need is an AES engine

and nothing else. And because of this, CCM actually was adopted by the Wi-Fi

alliance, and in fact, you're probably using CCM on a daily basis if you're using

encrypted Wi-Fi 802.11i then you're basically using CCM to encrypt traffic

between your laptop and the access point. There's another mode called a EAX that

uses counter mode encryption, and then CMAC. So, again you notice encrypt-then-MAC

and that's another fine mode to use. We'll do a comparison of all these

modes in just a minute. Now I wanted to mention that first of all, all these modes are

nonce-based. In other words, they don't use any randomness but they do take as

input a nonce and the nonce has to be unique per key. In other words, as you

remember, the pair (key, nonce) should never ever, ever repeat. But the

nonce itself need not be random, so it's perfectly fine to use a counter, for

example, as a nonce. And the other important point is that, in fact, all

these modes are what's called authenticated encryption with associated

data. This is an extension of authenticated encryption, that comes

up very often in networking protocols. So the idea between AEAD is that, in fact,

the message that's provided to the encryption mode is not intended to be fully

encrypted. Only part of the message is intended to be encrypted, but all of the

message is intended to be authenticated. A good example of this is a network packet.

Think of like a IP packet where there's a header and then there's a payload. And

typically the header is not gonna be encrypted. For example, the header might

contain the destination of the packet, but then the header had better not be

encrypted otherwise routers along the way wouldn't know where to route the packet.

And so, typically the header is sent in the clear, but the payload, of course, is

always encrypted, but what you'd like to do is have the header be authenticated.

Not encrypted but authenticated. So this is exactly what these AEAD modes do. They

will authenticate the header and then encrypt the payload. But the header and

the payload are bound together in the authentication so they can't

actually be separated. So this is not difficult to do. What happens is in these

three modes GCM, CCM, and EAX, basically the MAC is applied to the entire data. But

the encryption is only applied to the part of the data that needs to be encrypted.

So I wanted to show you what an API to these authenticated encryption with

associated data encryption schemes look like. So here's what it looks like in OpenSSL.

For example, this is, an API for GCM. So what you do is you call the

init function to initialize the encryption mode, and you notice you give it a key and

the nonce. The nonce again, doesn't have to be random, but it has to

be unique. And after initialization, you would call this encrypt function, where

you see that you give it the associated data that's gonna be authenticated, but

not encrypted. You give it the data, and it's gonna be both authenticated and

encrypted. And it gives you back the full cipher text, which is an encryption of the

data, but of course does not include the AEAD, because the AEAD is gonna be sent in

the clear. So now that we understand this mode of encrypt-then-MAC, we can go

back to the definition of MAC security and I can explain to you something that might

have been a little obscure when we looked at that definition. So if you remember,

one of the requirements that followed from our definition of secure MACs meant that

given a message-MAC pair on a message M, the attacker cannot produce another tag on

the same message M. In other words, even though the attacker already has a tag for

the message M, he shouldn't be able to produce a new tag for the same message M.

And it's really not clear, why does that matter? Who cares, if the adversary already

has a tag on the message M, who cares if he can produce another tag? Well, it turns

out if the MAC didn't have this property. In other words, given a message-MAC pair

you can produce another MAC on the same message, then that MAC would

result in an insecure encrypt-then-MAC mode. And so if we want our encrypt-then-MAC to

have cipher text integrity, it's crucial that our MAC security would imply this strong

notion of security, which, of course, it does because we defined it correctly.

So let's see what would go wrong, if, in fact, it was easy to produce this type of

forgery. So what I'll do is I'll show you a chosen cipher text attack on the

resulting encrypt-then-MAC system. And since the system has a chosen cipher text

attack on it, it necessarily means that it doesn't provide authenticated

encryption. So let's see. So the adversary's gonnna start by sending two

messages, M0 and M1. And he's gonna receive, as usual, the encryption of one

of them, either the encryption of M0 or the encryption of M1. And since we're

using encrypt-then-MAC, the adversary receives the cipher text we'll call it C0

and a MAC on the cipher text C0. Well now we said that given the MAC on

a message the adversary can produce another MAC on the same message. So what

he's gonna do is he's gonna produce another MAC on the message C0. Now he has

a new cipher text (C0,T'), which is a perfectly valid cipher text. T' is a

valid MAC of C0. Therefore, the adversary now can submit a chosen cipher text query

on C' and this is a valid chosen cipher text query because it's different

from C. It's a new cipher text. The poor challenger now is forced to decrypt this

cipher text C' so he's going to send back the decryption of C'. It's a

valid cipher text therefore the decryption of C prime is the message Mb but now the

attacker just learned the value of B because he can test whether Mb is equal to

M0 or MB is equal to M1. As a result he can just output B and he gets advantage

one in defeating the scheme. And so again if our MAC security did not imply

this property here. Then, there would be a chosen cipher text attack on encrypt-then-MAC.

And therefore, it would not be secure. So the fact that we define MAC security correctly

means that encrypt-then-MAC really does provide authenticated encryption. And

throughout all the MACs that we discussed actually do satisfy this strong notion of

unforgeability. So, interestingly, this is not the end of the story. So, as we said

before the concept of authenticated encryption was introduced everyone was

just combining MACs and encryption in various ways in the hope of achieving

some authenticated encryption. After the notion of authenticated encryption

became formalized and rigorous, people kind of started scratching their heads and said,

hey, wait a minute. Maybe we can achieve authenticated encryption more efficiently

than by combining a MAC and an encryption scheme. In fact, if you think about how

this combination of MAC and encryption works, let's say we combine counter mode

with CMAC, then for every block of plaintext, you first of all have to use

your block cipher for counter mode, and then you have to use to your block cipher

again, for the CBC-MAC. This means that if you're combining CPA secure encryption with a

MAC, for every block of plaintext, you have to evaluate your block cipher twice,

once for the MAC and once for the encryption scheme. So the natural question

was, can we construct an authenticated encryption scheme directly from a PRP,

such that we would have to only evaluate the PRP once per block? And it turns out

the answer is yes, and there's this beautiful construction called OCB, that

pretty much does everything you want, and is much faster than constructions that are

separately built from an encryption and a MAC. So I wrote down, kind of a schematic

of OCB. I don't want to explain it in detail. I'll just kind of explain it at a

high level. So here we have our input plain text, here at the top. And you

notice that, first of all, OCB is parallelizable, completely parallelizable.

So every block can be encrypted separately of every other block. The other thing to

notice is that as I promised, you only evaluate your block cipher once per plain

text block. And then you evaluate it one more time at the end to build your

authentication tag and then the overhead of OCB beyond just a block cipher is

minimal. All you have to do is evaluate a certain very simple function P. The

nonce goes into the P you notice, the key goes into this P and then there is a

block counter that goes into this P. So you just evaluate this function P, twice

for every block and you XOR the result before and after encryption using the

block cipher and that's it. That's all you have to do and then you get a very fast

and efficient authenticated encryption scheme built from a block cipher. So OCB

actually has a nice security theorem associated with it and I am going to point

to a paper on OCB when we get to end of this module where I list some further

reading papers that you can take a look at. So you might be wondering if OCB is so

much better than everything you've seen so far, all these three standards CCM, GCM and

EAX why isn't OCB being used or why isn't OCB the standard? And the answer is a

little sad. The primary answer that OCB is not being used is actually because

of various patents. And I'll just leave it at that. So to conclude this section I

wanted to show you some performance numbers. So here on the right I listed

performance numbers for modes that you shouldn't be using. So this is for

randomized counter mode, and this is for randomized CBC. And you can see also the

performance of CBC MAC is basically the same as the performance of CBC encryption.

Okay. Now here are the authenticated encryption modes, so these are the ones

that you're supposed to using, these you're not supposed to be using on their

own, right. These two, you should never ever use these two because they only

provide CPA security, they don't actually provide security against active

attacks. You're only supposed to use authenticated encryption for encryption.

And so I listed performance numbers for the three standards. And let me remind

you that GCM basically uses a very fast hash. And then it uses counter mode for

actual encryption. And you can see that the overhead of GCM over counter mode is

relatively small. CCM and EAX both use a block cipher based encryption and a

block cipher based MAC. And as a result they're about twice as slow as counter

modes. You see that OCB is actually the fastest of these, primarily because it

only use the block cipher once per message block. So based on these performance

numbers, you would think that GCM is exactly the right mode to always use. But

it turns out if you're on the space constrained hardware, GCM is not ideal.

Primarily because its implementation requires larger code than the other two

modes. However, as I said, Intel specifically added instructions to speed

up GCM mode. And as a result, implementing GCM on an Intel architecture takes

very little code. But on other hardware platforms, say in smart cards or other

constrained environments, the code size for implementing GCM would be considerably

larger than for the other two modes. But if code size is not a constraint then GCM

is the right mode to use. So to summarize this segment I want to say it one more

time that when you want to encrypt messages you have to use an authenticated

encryption mode and the recommended way to do it is to use one of the standards,

namely one of these three modes for providing authenticated encryption.

Don't implement the encryption scheme yourself. In other words don't implement

encrypt-then-MAC yourself. Just use one of these three standards. Many crypto libraries

now provide standard API's for these three modes and these are the one's you should

be using and nothing else. In the next segment we're going to see what else can

go wrong when you try to implement authenticated encryption by yourself.