Internet Research Task Force (IRTF) Y. Nir Request for Comments: 8439 Dell EMC Obsoletes: 7539 A. Langley Category: Informational Google, Inc. ISSN: 2070-1721 June 2018
ChaCha20 and Poly1305 for IETF Protocols
Abstract
This document defines the ChaCha20 stream cipher as well as the use of the Poly1305 authenticator, both as stand-alone algorithms and as a "combined mode", or Authenticated Encryption with Associated Data (AEAD) algorithm.
RFC 7539, the predecessor of this document, was meant to serve as a stable reference and an implementation guide. It was a product of the Crypto Forum Research Group (CFRG). This document merges the errata filed against RFC 7539 and adds a little text to the Security Considerations section.
Status of This Memo
This document is not an Internet Standards Track specification; it is published for informational purposes.
This document is a product of the Internet Research Task Force (IRTF). The IRTF publishes the results of Internet-related research and development activities. These results might not be suitable for deployment. This RFC represents the consensus of the Crypto Forum Research Group of the Internet Research Task Force (IRTF). Documents approved for publication by the IRSG are not candidates for any level of Internet Standard; see Section 2 of RFC 7841.
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8439.
Nir & Langley Informational [Page 1]
RFC 8439 ChaCha20 & Poly1305 June 2018
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
The Advanced Encryption Standard (AES -- [FIPS-197]) has become the gold standard in encryption. Its efficient design, widespread implementation, and hardware support allow for high performance in many areas. On most modern platforms, AES is anywhere from four to ten times as fast as the previous most-used cipher, Triple Data Encryption Standard (3DES -- [SP800-67]), which makes it not only the best choice, but the only practical choice.
There are several problems with this. If future advances in cryptanalysis reveal a weakness in AES, users will be in an unenviable position. With the only other widely supported cipher being the much slower 3DES, it is not feasible to reconfigure deployments to use 3DES. [Standby-Cipher] describes this issue and the need for a standby cipher in greater detail. Another problem is that while AES is very fast on dedicated hardware, its performance on platforms that lack such hardware is considerably lower. Yet another problem is that many AES implementations are vulnerable to cache- collision timing attacks ([Cache-Collisions]).
This document provides a definition and implementation guide for three algorithms:
1. The ChaCha20 cipher. This is a high-speed cipher first described in [ChaCha]. It is considerably faster than AES in software-only implementations, making it around three times as fast on platforms that lack specialized AES hardware. See Appendix B for some hard numbers. ChaCha20 is also not sensitive to timing attacks (see the security considerations in Section 4). This algorithm is described in Section 2.4
2. The Poly1305 authenticator. This is a high-speed message authentication code. Implementation is also straightforward and easy to get right. The algorithm is described in Section 2.5.
3. The CHACHA20-POLY1305 Authenticated Encryption with Associated Data (AEAD) construction, described in Section 2.8.
This document and its predecessor do not introduce these new algorithms for the first time. They have been defined in scientific papers by D. J. Bernstein [ChaCha][Poly1305]. The purpose of this document is to serve as a stable reference for IETF documents making use of these algorithms.
These algorithms have undergone rigorous analysis. Several papers discuss the security of Salsa and ChaCha ([LatinDances], [LatinDances2], [Zhenqing2012]).
Nir & Langley Informational [Page 4]
RFC 8439 ChaCha20 & Poly1305 June 2018
This document represents the consensus of the Crypto Forum Research Group (CFRG). It replaces [RFC7539].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
The description of the ChaCha algorithm will at various time refer to the ChaCha state as a "vector" or as a "matrix". This follows the use of these terms in [ChaCha]. The matrix notation is more visually convenient and gives a better notion as to why some rounds are called "column rounds" while others are called "diagonal rounds". Here's a diagram of how the matrices relate to vectors (using the C language convention of zero being the index origin).
The elements in this vector or matrix are 32-bit unsigned integers.
The algorithm name is "ChaCha". "ChaCha20" is a specific instance where 20 "rounds" (or 80 quarter rounds -- see Section 2.1) are used. Other variations are defined, with 8 or 12 rounds, but in this document we only describe the 20-round ChaCha, so the names "ChaCha" and "ChaCha20" will be used interchangeably.
The basic operation of the ChaCha algorithm is the quarter round. It operates on four 32-bit unsigned integers, denoted a, b, c, and d. The operation is as follows (in C-like notation):
a += b; d ^= a; d <<<= 16; c += d; b ^= c; b <<<= 12; a += b; d ^= a; d <<<= 8; c += d; b ^= c; b <<<= 7;
Nir & Langley Informational [Page 5]
RFC 8439 ChaCha20 & Poly1305 June 2018
Where "+" denotes integer addition modulo 2^32, "^" denotes a bitwise Exclusive OR (XOR), and "<<< n" denotes an n-bit left roll (towards the high bits).
For example, let's see the add, XOR, and roll operations from the fourth line with sample numbers:
a = 0x11111111 b = 0x01020304 c = 0x77777777 d = 0x01234567 c = c + d = 0x77777777 + 0x01234567 = 0x789abcde b = b ^ c = 0x01020304 ^ 0x789abcde = 0x7998bfda b = b <<< 7 = 0x7998bfda <<< 7 = 0xcc5fed3c
The ChaCha state does not have four integer numbers: it has 16. So the quarter-round operation works on only four of them -- hence the name. Each quarter round operates on four predetermined numbers in the ChaCha state. We will denote by QUARTERROUND(x, y, z, w) a quarter-round operation on the numbers at indices x, y, z, and w of the ChaCha state when viewed as a vector. For example, if we apply QUARTERROUND(1, 5, 9, 13) to a state, this means running the quarter- round operation on the elements marked with an asterisk, while leaving the others alone:
The ChaCha block function transforms a ChaCha state by running multiple quarter rounds.
The inputs to ChaCha20 are:
o A 256-bit key, treated as a concatenation of eight 32-bit little- endian integers.
o A 96-bit nonce, treated as a concatenation of three 32-bit little- endian integers.
o A 32-bit block count parameter, treated as a 32-bit little-endian integer.
The output is 64 random-looking bytes.
Nir & Langley Informational [Page 7]
RFC 8439 ChaCha20 & Poly1305 June 2018
The ChaCha algorithm described here uses a 256-bit key. The original algorithm also specified 128-bit keys and 8- and 12-round variants, but these are out of scope for this document. In this section, we describe the ChaCha block function.
Note also that the original ChaCha had a 64-bit nonce and 64-bit block count. We have modified this here to be more consistent with recommendations in Section 3.2 of [RFC5116]. This limits the use of a single (key,nonce) combination to 2^32 blocks, or 256 GB, but that is enough for most uses. In cases where a single key is used by multiple senders, it is important to make sure that they don't use the same nonces. This can be assured by partitioning the nonce space so that the first 32 bits are unique per sender, while the other 64 bits come from a counter.
The ChaCha20 state is initialized as follows:
o The first four words (0-3) are constants: 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574.
o The next eight words (4-11) are taken from the 256-bit key by reading the bytes in little-endian order, in 4-byte chunks.
o Word 12 is a block counter. Since each block is 64-byte, a 32-bit word is enough for 256 gigabytes of data.
o Words 13-15 are a nonce, which MUST not be repeated for the same key. The 13th word is the first 32 bits of the input nonce taken as a little-endian integer, while the 15th word is the last 32 bits.
ChaCha20 runs 20 rounds, alternating between "column rounds" and "diagonal rounds". Each round consists of four quarter-rounds, and they are run as follows. Quarter rounds 1-4 are part of a "column" round, while 5-8 are part of a "diagonal" round:
At the end of 20 rounds (or 10 iterations of the above list), we add the original input words to the output words, and serialize the result by sequencing the words one-by-one in little-endian order.
Note: "addition" in the above paragraph is done modulo 2^32. In some machine languages, this is called carryless addition on a 32-bit word.
Note: This section and a few others contain pseudocode for the algorithm explained in a previous section. Every effort was made for the pseudocode to accurately reflect the algorithm as described in the preceding section. If a conflict is still present, the textual explanation and the test vectors are normative.
chacha20_block(key, counter, nonce): state = constants | key | counter | nonce initial_state = state for i=1 upto 10 inner_block(state) end state += initial_state return serialize(state) end
Where the pipe character ("|") denotes concatenation.
2.3.2. Test Vector for the ChaCha20 Block Function
For a test vector, we will use the following inputs to the ChaCha20 block function:
o Key = 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:0f:10:11:12:13: 14:15:16:17:18:19:1a:1b:1c:1d:1e:1f. The key is a sequence of octets with no particular structure before we copy it into the ChaCha state.
o Nonce = (00:00:00:09:00:00:00:4a:00:00:00:00)
o Block Count = 1.
After setting up the ChaCha state, it looks like this:
ChaCha20 is a stream cipher designed by D. J. Bernstein. It is a refinement of the Salsa20 algorithm, and it uses a 256-bit key.
ChaCha20 successively calls the ChaCha20 block function, with the same key and nonce, and with successively increasing block counter parameters. ChaCha20 then serializes the resulting state by writing the numbers in little-endian order, creating a keystream block. Concatenating the keystream blocks from the successive blocks forms a keystream. The ChaCha20 function then performs an XOR of this keystream with the plaintext. Alternatively, each keystream block can be XORed with a plaintext block before proceeding to create the next block, saving some memory. There is no requirement for the plaintext to be an integral multiple of 512 bits. If there is extra keystream from the last block, it is discarded. Specific protocols MAY require that the plaintext and ciphertext have certain length. Such protocols need to specify how the plaintext is padded and how much padding it receives.
The inputs to ChaCha20 are:
o A 256-bit key
o A 32-bit initial counter. This can be set to any number, but will usually be zero or one. It makes sense to use one if we use the zero block for something else, such as generating a one-time authenticator key as part of an AEAD algorithm.
o A 96-bit nonce. In some protocols, this is known as the Initialization Vector.
o An arbitrary-length plaintext
Nir & Langley Informational [Page 11]
RFC 8439 ChaCha20 & Poly1305 June 2018
The output is an encrypted message, or "ciphertext", of the same length.
Decryption is done in the same way. The ChaCha20 block function is used to expand the key into a keystream, which is XORed with the ciphertext giving back the plaintext.
2.4.1. The ChaCha20 Encryption Algorithm in Pseudocode
chacha20_encrypt(key, counter, nonce, plaintext): for j = 0 upto floor(len(plaintext)/64)-1 key_stream = chacha20_block(key, counter+j, nonce) block = plaintext[(j*64)..(j*64+63)] encrypted_message += block ^ key_stream end if ((len(plaintext) % 64) != 0) j = floor(len(plaintext)/64) key_stream = chacha20_block(key, counter+j, nonce) block = plaintext[(j*64)..len(plaintext)-1] encrypted_message += (block^key_stream)[0..len(plaintext)%64] end return encrypted_message end
2.4.2. Example and Test Vector for the ChaCha20 Cipher
For a test vector, we will use the following inputs to the ChaCha20 block function:
o Key = 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:0f:10:11:12:13: 14:15:16:17:18:19:1a:1b:1c:1d:1e:1f.
o Nonce = (00:00:00:00:00:00:00:4a:00:00:00:00).
o Initial Counter = 1.
We use the following for the plaintext. It was chosen to be long enough to require more than one block, but not so long that it would make this example cumbersome (so, less than 3 blocks):
Nir & Langley Informational [Page 12]
RFC 8439 ChaCha20 & Poly1305 June 2018
Plaintext Sunscreen: 000 4c 61 64 69 65 73 20 61 6e 64 20 47 65 6e 74 6c Ladies and Gentl 016 65 6d 65 6e 20 6f 66 20 74 68 65 20 63 6c 61 73 emen of the clas 032 73 20 6f 66 20 27 39 39 3a 20 49 66 20 49 20 63 s of '99: If I c 048 6f 75 6c 64 20 6f 66 66 65 72 20 79 6f 75 20 6f ould offer you o 064 6e 6c 79 20 6f 6e 65 20 74 69 70 20 66 6f 72 20 nly one tip for 080 74 68 65 20 66 75 74 75 72 65 2c 20 73 75 6e 73 the future, suns 096 63 72 65 65 6e 20 77 6f 75 6c 64 20 62 65 20 69 creen would be i 112 74 2e t.
The following figure shows four ChaCha state matrices:
1. First block as it is set up.
2. Second block as it is set up. Note that these blocks are only two bits apart -- only the counter in position 12 is different.
3. Third block is the first block after the ChaCha20 block operation was applied.
4. Final block is the second block after the ChaCha20 block operation was applied.
Poly1305 is a one-time authenticator designed by D. J. Bernstein. Poly1305 takes a 32-byte one-time key and a message and produces a 16-byte tag. This tag is used to authenticate the message.
The original article ([Poly1305]) is titled "The Poly1305-AES message-authentication code", and the MAC function there requires a 128-bit AES key, a 128-bit "additional key", and a 128-bit (non- secret) nonce. AES is used there for encrypting the nonce, so as to get a unique (and secret) 128-bit string, but as the paper states, "There is nothing special about AES here. One can replace AES with an arbitrary keyed function from an arbitrary set of nonces to 16-byte strings."
Regardless of how the key is generated, the key is partitioned into two parts, called "r" and "s". The pair (r,s) should be unique, and MUST be unpredictable for each invocation (that is why it was originally obtained by encrypting a nonce), while "r" MAY be constant, but needs to be modified as follows before being used: ("r" is treated as a 16-octet little-endian number):
o r[3], r[7], r[11], and r[15] are required to have their top four bits clear (be smaller than 16)
o r[4], r[8], and r[12] are required to have their bottom two bits clear (be divisible by 4)
Nir & Langley Informational [Page 14]
RFC 8439 ChaCha20 & Poly1305 June 2018
The following sample code clamps "r" to be appropriate:
/* Adapted from poly1305aes_test_clamp.c version 20050207 D. J. Bernstein Public domain. */
Where "&=" is the C language bitwise AND assignment operator.
The "s" should be unpredictable, but it is perfectly acceptable to generate both "r" and "s" uniquely each time. Because each of them is 128 bits, pseudorandomly generating them (see Section 2.6) is also acceptable.
The inputs to Poly1305 are:
o A 256-bit one-time key
o An arbitrary length message
The output is a 128-bit tag.
First, the "r" value is clamped.
Next, set the constant prime "P" be 2^130-5: 3fffffffffffffffffffffffffffffffb. Also set a variable "accumulator" to zero.
Nir & Langley Informational [Page 15]
RFC 8439 ChaCha20 & Poly1305 June 2018
Next, divide the message into 16-byte blocks. The last one might be shorter:
o Read the block as a little-endian number.
o Add one bit beyond the number of octets. For a 16-byte block, this is equivalent to adding 2^128 to the number. For the shorter
block, it can be 2^120, 2^112, or any power of two that is evenly divisible by 8, all the way down to 2^8.
o If the block is not 17 bytes long (the last block), pad it with zeros. This is meaningless if you are treating the blocks as numbers.
o Add this number to the accumulator.
o Multiply by "r".
o Set the accumulator to the result modulo p. To summarize: Acc = ((Acc+block)*r) % p.
Finally, the value of the secret key "s" is added to the accumulator, and the 128 least significant bits are serialized in little-endian order to form the tag.
clamp(r): r &= 0x0ffffffc0ffffffc0ffffffc0fffffff poly1305_mac(msg, key): r = le_bytes_to_num(key[0..15]) clamp(r) s = le_bytes_to_num(key[16..31]) a = 0 /* a is the accumulator */ p = (1<<130)-5 for i=1 upto ceil(msg length in bytes / 16) n = le_bytes_to_num(msg[((i-1)*16)..(i*16)] | [0x01]) a += n a = (r * a) % p end a += s return num_to_16_le_bytes(a) end
For our example, we will dispense with generating the one-time key using AES, and assume that we got the following keying material:
o Key Material: 85:d6:be:78:57:55:6d:33:7f:44:52:fe:42:d5:06:a8:01:0 3:80:8a:fb:0d:b2:fd:4a:bf:f6:af:41:49:f5:1b
o s as an octet string: 01:03:80:8a:fb:0d:b2:fd:4a:bf:f6:af:41:49:f5:1b
o s as a 128-bit number: 1bf54941aff6bf4afdb20dfb8a800301
o r before clamping: 85:d6:be:78:57:55:6d:33:7f:44:52:fe:42:d5:06:a8
o Clamped r as a number: 806d5400e52447c036d555408bed685
For our message, we'll use a short text:
Message to be Authenticated: 000 43 72 79 70 74 6f 67 72 61 70 68 69 63 20 46 6f Cryptographic Fo 016 72 75 6d 20 52 65 73 65 61 72 63 68 20 47 72 6f rum Research Gro 032 75 70 up
Since Poly1305 works in 16-byte chunks, the 34-byte message divides into three blocks. In the following calculation, "Acc" denotes the accumulator and "Block" the current block:
Block #1
Acc = 00 Block = 6f4620636968706172676f7470797243 Block with 0x01 byte = 016f4620636968706172676f7470797243 Acc + block = 016f4620636968706172676f7470797243 (Acc+Block) * r = b83fe991ca66800489155dcd69e8426ba2779453994ac90ed284034da565ecf Acc = ((Acc+Block)*r) % P = 2c88c77849d64ae9147ddeb88e69c83fc
Block #2
Acc = 2c88c77849d64ae9147ddeb88e69c83fc Block = 6f7247206863726165736552206d7572 Block with 0x01 byte = 016f7247206863726165736552206d7572 Acc + block = 437febea505c820f2ad5150db0709f96e (Acc+Block) * r = 21dcc992d0c659ba4036f65bb7f88562ae59b32c2b3b8f7efc8b00f78e548a26 Acc = ((Acc+Block)*r) % P = 2d8adaf23b0337fa7cccfb4ea344b30de
As said in Section 2.5, it is acceptable to generate the one-time Poly1305 key pseudorandomly. This section defines such a method.
To generate such a key pair (r,s), we will use the ChaCha20 block function described in Section 2.3. This assumes that we have a 256-bit session key specifically for the Message Authentication Code (MAC) function. Any document that specifies the use of Poly1305 as a MAC algorithm for some protocol MUST specify that 256 bits are allocated for the integrity key. Note that in the AEAD construction defined in Section 2.8, the same key is used for encryption and key generation.
The method is to call the block function with the following parameters:
o The 256-bit session integrity key is used as the ChaCha20 key.
o The block counter is set to zero.
o The protocol will specify a 96-bit or 64-bit nonce. This MUST be unique per invocation with the same key, so it MUST NOT be randomly generated. A counter is a good way to implement this, but other methods, such as a Linear Feedback Shift Register (LFSR) are also acceptable. ChaCha20 as specified here requires a 96-bit nonce. So if the provided nonce is only 64-bit, then the first 32 bits of the nonce will be set to a constant number. This will usually be zero, but for protocols with multiple senders it may be different for each sender, but SHOULD be the same for all invocations of the function with the same key by a particular sender.
Nir & Langley Informational [Page 18]
RFC 8439 ChaCha20 & Poly1305 June 2018
After running the block function, we have a 512-bit state. We take the first 256 bits of the serialized state, and use those as the one- time Poly1305 key: the first 128 bits are clamped and form "r", while the next 128 bits become "s". The other 256 bits are discarded.
Note that while many protocols have provisions for a nonce for encryption algorithms (often called Initialization Vectors, or IVs), they usually don't have such a provision for the MAC function. In that case, the per-invocation nonce will have to come from somewhere else, such as a message counter.
And that output is also the 32-byte one-time key used for Poly1305.
Nir & Langley Informational [Page 19]
RFC 8439 ChaCha20 & Poly1305 June 2018
2.7. A Pseudorandom Function for Crypto Suites Based on ChaCha/Poly1305
Some protocols, such as IKEv2 ([RFC7296]), require a Pseudorandom Function (PRF), mostly for key derivation. In the IKEv2 definition, a PRF is a function that accepts a variable-length key and a variable-length input, and returns a fixed-length output. Most commonly, Hashed MAC (HMAC) constructions are used for this purpose, and often the same function is used for both message authentication and PRF.
Poly1305 is not a suitable choice for a PRF. Poly1305 prohibits using the same key twice, whereas the PRF in IKEv2 is used multiple times with the same key. Additionally, unlike HMAC, Poly1305 is biased, so using it for key derivation would reduce the security of the symmetric encryption.
Chacha20 could be used as a key-derivation function, by generating an arbitrarily long keystream. However, that is not what protocols such as IKEv2 require.
For this reason, this document does not specify a PRF.
AEAD_CHACHA20_POLY1305 is an authenticated encryption with additional data algorithm. The inputs to AEAD_CHACHA20_POLY1305 are:
o A 256-bit key
o A 96-bit nonce -- different for each invocation with the same key
o An arbitrary length plaintext
o Arbitrary length additional authenticated data (AAD)
Some protocols may have unique per-invocation inputs that are not 96 bits in length. For example, IPsec may specify a 64-bit nonce. In such a case, it is up to the protocol document to define how to transform the protocol nonce into a 96-bit nonce, for example, by concatenating a constant value.
The ChaCha20 and Poly1305 primitives are combined into an AEAD that takes a 256-bit key and 96-bit nonce as follows:
o First, a Poly1305 one-time key is generated from the 256-bit key and nonce using the procedure described in Section 2.6.
Nir & Langley Informational [Page 20]
RFC 8439 ChaCha20 & Poly1305 June 2018
o Next, the ChaCha20 encryption function is called to encrypt the plaintext, using the same key and nonce, and with the initial counter set to 1.
o Finally, the Poly1305 function is called with the Poly1305 key calculated above, and a message constructed as a concatenation of the following:
* The AAD
* padding1 -- the padding is up to 15 zero bytes, and it brings the total length so far to an integral multiple of 16. If the length of the AAD was already an integral multiple of 16 bytes, this field is zero-length.
* The ciphertext
* padding2 -- the padding is up to 15 zero bytes, and it brings the total length so far to an integral multiple of 16. If the length of the ciphertext was already an integral multiple of 16 bytes, this field is zero-length.
* The length of the additional data in octets (as a 64-bit little-endian integer).
* The length of the ciphertext in octets (as a 64-bit little- endian integer).
The output from the AEAD is the concatenation of:
o A ciphertext of the same length as the plaintext.
o A 128-bit tag, which is the output of the Poly1305 function.
Decryption is similar with the following differences:
o The roles of ciphertext and plaintext are reversed, so the ChaCha20 encryption function is applied to the ciphertext, producing the plaintext.
o The Poly1305 function is still run on the AAD and the ciphertext, not the plaintext.
o The calculated tag is bitwise compared to the received tag. The message is authenticated if and only if the tags match.
Nir & Langley Informational [Page 21]
RFC 8439 ChaCha20 & Poly1305 June 2018
A few notes about this design:
1. The amount of encrypted data possible in a single invocation is 2^32-1 blocks of 64 bytes each, because of the size of the block counter field in the ChaCha20 block function. This gives a total of 274,877,906,880 bytes, or nearly 256 GB. This should be enough for traffic protocols such as IPsec and TLS, but may be too small for file and/or disk encryption. For such uses, we can return to the original design, reduce the nonce to 64 bits, and use the integer at position 13 as the top 32 bits of a 64-bit block counter, increasing the total message size to over a million petabytes (1,180,591,620,717,411,303,360 bytes to be exact).
2. Despite the previous item, the ciphertext length field in the construction of the buffer on which Poly1305 runs limits the ciphertext (and hence, the plaintext) size to 2^64 bytes, or sixteen thousand petabytes (18,446,744,073,709,551,616 bytes to be exact).
The AEAD construction in this section is a novel composition of ChaCha20 and Poly1305. A security analysis of this composition is given in [Procter].
Here is a list of the parameters for this construction as defined in Section 4 of [RFC5116]:
o K_LEN (key length) is 32 octets.
o P_MAX (maximum size of the plaintext) is 274,877,906,880 bytes, or nearly 256 GB.
o A_MAX (maximum size of the associated data) is set to 2^64-1 octets by the length field for associated data.
o N_MIN = N_MAX = 12 octets.
o C_MAX = P_MAX + tag length = 274,877,906,896 octets.
Distinct AAD inputs (as described in Section 3.3 of [RFC5116]) shall be concatenated into a single input to AEAD_CHACHA20_POLY1305. It is up to the application to create a structure in the AAD input if it is needed.
Ciphertext: 000 d3 1a 8d 34 64 8e 60 db 7b 86 af bc 53 ef 7e c2 ...4d.`.{...S.~. 016 a4 ad ed 51 29 6e 08 fe a9 e2 b5 a7 36 ee 62 d6 ...Q)n......6.b. 032 3d be a4 5e 8c a9 67 12 82 fa fb 69 da 92 72 8b =..^..g....i..r. 048 1a 71 de 0a 9e 06 0b 29 05 d6 a5 b6 7e cd 3b 36 .q.....)....~.;6 064 92 dd bd 7f 2d 77 8b 8c 98 03 ae e3 28 09 1b 58 ....-w......(..X 080 fa b3 24 e4 fa d6 75 94 55 85 80 8b 48 31 d7 bc ..$...u.U...H1.. 096 3f f4 de f0 8e 4b 7a 9d e5 76 d2 65 86 ce c6 4b ?....Kz..v.e...K 112 61 16 a.
Nir & Langley Informational [Page 24]
RFC 8439 ChaCha20 & Poly1305 June 2018
AEAD Construction for Poly1305: 000 50 51 52 53 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 PQRS............ 016 d3 1a 8d 34 64 8e 60 db 7b 86 af bc 53 ef 7e c2 ...4d.`.{...S.~. 032 a4 ad ed 51 29 6e 08 fe a9 e2 b5 a7 36 ee 62 d6 ...Q)n......6.b. 048 3d be a4 5e 8c a9 67 12 82 fa fb 69 da 92 72 8b =..^..g....i..r. 064 1a 71 de 0a 9e 06 0b 29 05 d6 a5 b6 7e cd 3b 36 .q.....)....~.;6 080 92 dd bd 7f 2d 77 8b 8c 98 03 ae e3 28 09 1b 58 ....-w......(..X 096 fa b3 24 e4 fa d6 75 94 55 85 80 8b 48 31 d7 bc ..$...u.U...H1.. 112 3f f4 de f0 8e 4b 7a 9d e5 76 d2 65 86 ce c6 4b ?....Kz..v.e...K 128 61 16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a............... 144 0c 00 00 00 00 00 00 00 72 00 00 00 00 00 00 00 ........r.......
Note the four zero bytes in line 000 and the 14 zero bytes in line 128
Each block of ChaCha20 involves 16 move operations and one increment operation for loading the state, 80 each of XOR, addition and roll operations for the rounds, 16 more add operations and 16 XOR operations for protecting the plaintext. Section 2.3 describes the ChaCha block function as "adding the original input words". This implies that before starting the rounds on the ChaCha state, we copy it aside, only to add it in later. This is correct, but we can save a few operations if we instead copy the state and do the work on the copy. This way, for the next block you don't need to recreate the state, but only to increment the block counter. This saves approximately 5.5% of the cycles.
It is not recommended to use a generic big number library such as the one in OpenSSL for the arithmetic operations in Poly1305. Such libraries use dynamic allocation to be able to handle an integer of any size, but that flexibility comes at the expense of performance as well as side-channel security. More efficient implementations that run in constant time are available, one of them in D. J. Bernstein's own library, NaCl ([NaCl]). A constant-time but not optimal approach would be to naively implement the arithmetic operations for 288-bit integers, because even a naive implementation will not exceed 2^288 in the multiplication of (acc+block) and r. An efficient constant- time implementation can be found in the public domain library poly1305-donna ([Poly1305_Donna]).
Nir & Langley Informational [Page 25]
RFC 8439 ChaCha20 & Poly1305 June 2018
4. Security Considerations
The ChaCha20 cipher is designed to provide 256-bit security.
The Poly1305 authenticator is designed to ensure that forged messages are rejected with a probability of 1-(n/(2^102)) for a 16n-byte message, even after sending 2^64 legitimate messages, so it is SUF-CMA (strong unforgeability against chosen-message attacks) in the terminology of [AE].
Proving the security of either of these is beyond the scope of this document. Such proofs are available in the referenced academic papers ([ChaCha], [Poly1305], [LatinDances], [LatinDances2], and [Zhenqing2012]).
The most important security consideration in implementing this document is the uniqueness of the nonce used in ChaCha20. Counters and LFSRs are both acceptable ways of generating unique nonces, as is encrypting a counter using a block cipher with a 64-bit block size such as DES. Note that it is not acceptable to use a truncation of a counter encrypted with block ciphers with 128-bit or 256-bit blocks, because such a truncation may repeat after a short time.
Consequences of repeating a nonce: If a nonce is repeated, then both the one-time Poly1305 key and the keystream are identical between the messages. This reveals the XOR of the plaintexts, because the XOR of the plaintexts is equal to the XOR of the ciphertexts.
The Poly1305 key MUST be unpredictable to an attacker. Randomly generating the key would fulfill this requirement, except that Poly1305 is often used in communications protocols, so the receiver should know the key. Pseudorandom number generation such as by encrypting a counter is acceptable. Using ChaCha with a secret key and a nonce is also acceptable.
The algorithms presented here were designed to be easy to implement in constant time to avoid side-channel vulnerabilities. The operations used in ChaCha20 are all additions, XORs, and fixed rolls. All of these can and should be implemented in constant time. Access to offsets into the ChaCha state and the number of operations do not depend on any property of the key, eliminating the chance of information about the key leaking through the timing of cache misses.
For Poly1305, the operations are addition, multiplication. and modulus, all on numbers with greater than 128 bits. This can be done in constant time, but a naive implementation (such as using some generic big number library) will not be constant time. For example, if the multiplication is performed as a separate operation from the
Nir & Langley Informational [Page 26]
RFC 8439 ChaCha20 & Poly1305 June 2018
modulus, the result will sometimes be under 2^256 and sometimes be above 2^256. Implementers should be careful about timing side- channels for Poly1305 by using the appropriate implementation of these operations.
Validating the authenticity of a message involves a bitwise comparison of the calculated tag with the received tag. In most use cases, nonces and AAD contents are not "used up" until a valid message is received. This allows an attacker to send multiple identical messages with different tags until one passes the tag comparison. This is hard if the attacker has to try all 2^128 possible tags one by one. However, if the timing of the tag comparison operation reveals how long a prefix of the calculated and received tags is identical, the number of messages can be reduced significantly. For this reason, with online protocols, implementation MUST use a constant-time comparison function rather than relying on optimized but insecure library functions such as the C language's memcmp().
Additionally, any protocol using this algorithm MUST include the complete tag to minimize the opportunity for forgery. Tag truncation MUST NOT be done.
IANA has updated the entry in the "Authenticated Encryption with Associated Data (AEAD) Parameters" registry with 29 as the Numeric ID and "AEAD_CHACHA20_POLY1305" as the name to point to this document as its reference.
[AE] Bellare, M. and C. Namprempre, "Authenticated Encryption: Relations among notions and analysis of the generic composition paradigm", DOI 10.1007/s00145-008-9026-x, September 2008, <http://dl.acm.org/citation.cfm?id=1410269>.
[FIPS-197] National Institute of Standards and Technology, "Advanced Encryption Standard (AES)", FIPS PUB 197, November 2001, <http://csrc.nist.gov/publications/fips/fips197/ fips-197.pdf>.
[LatinDances] Aumasson, J., Fischer, S., Khazaei, S., Meier, W., and C. Rechberger, "New Features of Latin Dances: Analysis of Salsa, ChaCha, and Rumba", December 2007, <http://cr.yp.to/rumba20/newfeatures-20071218.pdf>.
[LatinDances2] Ishiguro, T., Kiyomoto, S., and Y. Miyake, "Modified version of 'Latin Dances Revisited: New Analytic Results of Salsa20 and ChaCha'", February 2012, <https://eprint.iacr.org/2012/065.pdf>.
[NaCl] Bernstein, D., Lange, T., and P. Schwabe, "NaCl: Networking and Cryptography library", July 2012, <http://nacl.cr.yp.to>.
[SP800-67] National Institute of Standards and Technology, "Recommendation for the Triple Data Encryption Algorithm (TDEA) Block Cipher", NIST 800-67, Rev. 2, November 2017, <https://csrc.nist.gov/publications/detail/sp/800-67/ rev-2/final>.
[Standby-Cipher] McGrew, D., Grieco, A., and Y. Sheffer, "Selection of Future Cryptographic Standards", Work in Progress, draft- mcgrew-standby-cipher-00, January 2013.
[Zhenqing2012] Zhenqing, S., Bin, Z., Dengguo, F., and W. Wenling, "Improved Key Recovery Attacks on Reduced-Round Salsa20 and ChaCha*", 2012.
Notice how, in test vector #2, r is equal to zero. The part of the Poly1305 algorithm where the accumulator is multiplied by r means that with r equal zero, the tag will be equal to s regardless of the content of the text. Fortunately, all the proposed methods of generating r are such that getting this particular weak key is very unlikely.
Below we see decrypting a message. We receive a ciphertext, a nonce, and a tag. We know the key. We will check the tag and then (assuming that it validates) decrypt the ciphertext. In this particular protocol, we'll assume that there is no padding of the plaintext.
ChaCha20 and Poly1305 were invented by Daniel J. Bernstein. The AEAD construction and the method of creating the one-time Poly1305 key were invented by Adam Langley.
Thanks to Robert Ransom, Watson Ladd, Stefan Buhler, Dan Harkins, and Kenny Paterson for their helpful comments and explanations. Thanks to Niels Moller for suggesting the more efficient AEAD construction in this document. Special thanks to Ilari Liusvaara for providing extra test vectors, helpful comments, and for being the first to attempt an implementation from this document. Thanks to Sean Parkinson for suggesting improvements to the examples and the pseudocode. Thanks to David Ireland for pointing out a bug in the pseudocode, and to Stephen Farrell and Alyssa Rowan for pointing out missing advise in the security considerations.
Special thanks goes to Gordon Procter for performing a security analysis of the composition and publishing [Procter].
Jim Schaad and John Mattson provided feedback on tag truncation, and Russ Housley, Stanislav Smyshlyaev, and John Mattson each provided a review of this version.
Authors' Addresses
Yoav Nir Dell EMC 9 Andrei Sakharov St Haifa 3190500 Israel