This document describes the MD5 message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest. The MD5 algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.
Rivest [Page 1]
RFC 1321 MD5 Message-Digest Algorithm April 1992
The MD5 algorithm is designed to be quite fast on 32-bit machines. In addition, the MD5 algorithm does not require any large substitution tables; the algorithm can be coded quite compactly.
The MD5 algorithm is an extension of the MD4 message-digest algorithm 1,2]. MD5 is slightly slower than MD4, but is more "conservative" in design. MD5 was designed because it was felt that MD4 was perhaps being adopted for use more quickly than justified by the existing critical review; because MD4 was designed to be exceptionally fast, it is "at the edge" in terms of risking successful cryptanalytic attack. MD5 backs off a bit, giving up a little in speed for a much greater likelihood of ultimate security. It incorporates some suggestions made by various reviewers, and contains additional optimizations. The MD5 algorithm is being placed in the public domain for review and possible adoption as a standard.
For OSI-based applications, MD5's object identifier is
In this document a "word" is a 32-bit quantity and a "byte" is an eight-bit quantity. A sequence of bits can be interpreted in a natural manner as a sequence of bytes, where each consecutive group of eight bits is interpreted as a byte with the high-order (most significant) bit of each byte listed first. Similarly, a sequence of bytes can be interpreted as a sequence of 32-bit words, where each consecutive group of four bytes is interpreted as a word with the low-order (least significant) byte given first.
Let x_i denote "x sub i". If the subscript is an expression, we surround it in braces, as in x_{i+1}. Similarly, we use ^ for superscripts (exponentiation), so that x^i denotes x to the i-th power.
Let the symbol "+" denote addition of words (i.e., modulo-2^32 addition). Let X <<< s denote the 32-bit value obtained by circularly shifting (rotating) X left by s bit positions. Let not(X) denote the bit-wise complement of X, and let X v Y denote the bit-wise OR of X and Y. Let X xor Y denote the bit-wise XOR of X and Y, and let XY denote the bit-wise AND of X and Y.
We begin by supposing that we have a b-bit message as input, and that we wish to find its message digest. Here b is an arbitrary nonnegative integer; b may be zero, it need not be a multiple of eight, and it may be arbitrarily large. We imagine the bits of the message written down as follows:
m_0 m_1 ... m_{b-1}
The following five steps are performed to compute the message digest of the message.
The message is "padded" (extended) so that its length (in bits) is congruent to 448, modulo 512. That is, the message is extended so that it is just 64 bits shy of being a multiple of 512 bits long. Padding is always performed, even if the length of the message is already congruent to 448, modulo 512.
Padding is performed as follows: a single "1" bit is appended to the message, and then "0" bits are appended so that the length in bits of the padded message becomes congruent to 448, modulo 512. In all, at least one bit and at most 512 bits are appended.
A 64-bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step. In the unlikely event that b is greater than 2^64, then only the low-order 64 bits of b are used. (These bits are appended as two 32-bit words and appended low-order word first in accordance with the previous conventions.)
At this point the resulting message (after padding with bits and with b) has a length that is an exact multiple of 512 bits. Equivalently, this message has a length that is an exact multiple of 16 (32-bit) words. Let M[0 ... N-1] denote the words of the resulting message, where N is a multiple of 16.
A four-word buffer (A,B,C,D) is used to compute the message digest. Here each of A, B, C, D is a 32-bit register. These registers are initialized to the following values in hexadecimal, low-order bytes first):
Rivest [Page 3]
RFC 1321 MD5 Message-Digest Algorithm April 1992
word A: 01 23 45 67 word B: 89 ab cd ef word C: fe dc ba 98 word D: 76 54 32 10
We first define four auxiliary functions that each take as input three 32-bit words and produce as output one 32-bit word.
F(X,Y,Z) = XY v not(X) Z G(X,Y,Z) = XZ v Y not(Z) H(X,Y,Z) = X xor Y xor Z I(X,Y,Z) = Y xor (X v not(Z))
In each bit position F acts as a conditional: if X then Y else Z. The function F could have been defined using + instead of v since XY and not(X)Z will never have 1's in the same bit position.) It is interesting to note that if the bits of X, Y, and Z are independent and unbiased, the each bit of F(X,Y,Z) will be independent and unbiased.
The functions G, H, and I are similar to the function F, in that they act in "bitwise parallel" to produce their output from the bits of X, Y, and Z, in such a manner that if the corresponding bits of X, Y, and Z are independent and unbiased, then each bit of G(X,Y,Z), H(X,Y,Z), and I(X,Y,Z) will be independent and unbiased. Note that the function H is the bit-wise "xor" or "parity" function of its inputs.
This step uses a 64-element table T[1 ... 64] constructed from the sine function. Let T[i] denote the i-th element of the table, which is equal to the integer part of 4294967296 times abs(sin(i)), where i is in radians. The elements of the table are given in the appendix.
Do the following:
/* Process each 16-word block. */ For i = 0 to N/16-1 do
/* Copy block i into X. */ For j = 0 to 15 do Set X[j] to M[i*16+j]. end /* of loop on j */
/* Save A as AA, B as BB, C as CC, and D as DD. */ AA = A BB = B
/* Then perform the following additions. (That is increment each of the four registers by the value it had before this block was started.) */ A = A + AA B = B + BB C = C + CC D = D + DD
The MD5 message-digest algorithm is simple to implement, and provides a "fingerprint" or message digest of a message of arbitrary length. It is conjectured that the difficulty of coming up with two messages having the same message digest is on the order of 2^64 operations, and that the difficulty of coming up with any message having a given message digest is on the order of 2^128 operations. The MD5 algorithm has been carefully scrutinized for weaknesses. It is, however, a relatively new algorithm and further security analysis is of course justified, as is the case with any new proposal of this sort.
3. The function g in round 2 was changed from (XY v XZ v YZ) to (XZ v Y not(Z)) to make g less symmetric.
4. Each step now adds in the result of the previous step. This promotes a faster "avalanche effect".
5. The order in which input words are accessed in rounds 2 and 3 is changed, to make these patterns less like each other.
6. The shift amounts in each round have been approximately optimized, to yield a faster "avalanche effect." The shifts in different rounds are distinct.
Rivest [Page 6]
RFC 1321 MD5 Message-Digest Algorithm April 1992
References
[1] Rivest, R., "The MD4 Message Digest Algorithm", RFC 1320, MIT and RSA Data Security, Inc., April 1992.
[2] Rivest, R., "The MD4 message digest algorithm", in A.J. Menezes and S.A. Vanstone, editors, Advances in Cryptology - CRYPTO '90 Proceedings, pages 303-311, Springer-Verlag, 1991.
[3] CCITT Recommendation X.509 (1988), "The Directory - Authentication Framework."
APPENDIX A - Reference Implementation
This appendix contains the following files taken from RSAREF: A Cryptographic Toolkit for Privacy-Enhanced Mail:
global.h -- global header file
md5.h -- header file for MD5
md5c.c -- source code for MD5
For more information on RSAREF, send email to <rsaref@rsa.com>.
The appendix also includes the following file:
mddriver.c -- test driver for MD2, MD4 and MD5
The driver compiles for MD5 by default but can compile for MD2 or MD4 if the symbol MD is defined on the C compiler command line as 2 or 4.
The implementation is portable and should work on many different plaforms. However, it is not difficult to optimize the implementation on particular platforms, an exercise left to the reader. For example, on "little-endian" platforms where the lowest-addressed byte in a 32- bit word is the least significant and there are no alignment restrictions, the call to Decode in MD5Transform can be replaced with a typecast.
/* PROTOTYPES should be set to one if and only if the compiler supports function argument prototyping. The following makes PROTOTYPES default to 0 if it has not already
Rivest [Page 7]
RFC 1321 MD5 Message-Digest Algorithm April 1992
been defined with C compiler flags. */ #ifndef PROTOTYPES #define PROTOTYPES 0 #endif
/* POINTER defines a generic pointer type */ typedef unsigned char *POINTER;
/* UINT2 defines a two byte word */ typedef unsigned short int UINT2;
/* UINT4 defines a four byte word */ typedef unsigned long int UINT4;
/* PROTO_LIST is defined depending on how PROTOTYPES is defined above. If using PROTOTYPES, then PROTO_LIST returns the list, otherwise it returns an empty list. */ #if PROTOTYPES #define PROTO_LIST(list) list #else #define PROTO_LIST(list) () #endif
/* Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All rights reserved.
License to copy and use this software is granted provided that it is identified as the "RSA Data Security, Inc. MD5 Message-Digest Algorithm" in all material mentioning or referencing this software or this function.
License is also granted to make and use derivative works provided that such works are identified as "derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm" in all material mentioning or referencing the derived work.
RSA Data Security, Inc. makes no representations concerning either the merchantability of this software or the suitability of this software for any particular purpose. It is provided "as is" without express or implied warranty of any kind.
Rivest [Page 8]
RFC 1321 MD5 Message-Digest Algorithm April 1992
These notices must be retained in any copies of any part of this documentation and/or software. */
/* Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All rights reserved.
License to copy and use this software is granted provided that it is identified as the "RSA Data Security, Inc. MD5 Message-Digest Algorithm" in all material mentioning or referencing this software or this function.
License is also granted to make and use derivative works provided that such works are identified as "derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm" in all material mentioning or referencing the derived work.
RSA Data Security, Inc. makes no representations concerning either the merchantability of this software or the suitability of this software for any particular purpose. It is provided "as is" without express or implied warranty of any kind.
These notices must be retained in any copies of any part of this documentation and/or software. */
/* Transform as many times as possible. */ if (inputLen >= partLen) { MD5_memcpy ((POINTER)&context->buffer[index], (POINTER)input, partLen); MD5Transform (context->state, context->buffer);
for (i = partLen; i + 63 < inputLen; i += 64) MD5Transform (context->state, &input[i]);
/* MD5 basic transformation. Transforms state based on block. */ static void MD5Transform (state, block) UINT4 state[4]; unsigned char block[64]; { UINT4 a = state[0], b = state[1], c = state[2], d = state[3], x[16];
/* Encodes input (UINT4) into output (unsigned char). Assumes len is a multiple of 4. */ static void Encode (output, input, len) unsigned char *output; UINT4 *input; unsigned int len; { unsigned int i, j;
/* Decodes input (unsigned char) into output (UINT4). Assumes len is a multiple of 4. */ static void Decode (output, input, len) UINT4 *output; unsigned char *input; unsigned int len; { unsigned int i, j;
/* Note: Replace "for loop" with standard memcpy if possible. */
static void MD5_memcpy (output, input, len) POINTER output; POINTER input; unsigned int len; { unsigned int i;
for (i = 0; i < len; i++)
Rivest [Page 15]
RFC 1321 MD5 Message-Digest Algorithm April 1992
output[i] = input[i]; }
/* Note: Replace "for loop" with standard memset if possible. */ static void MD5_memset (output, value, len) POINTER output; int value; unsigned int len; { unsigned int i;
for (i = 0; i < len; i++) ((char *)output)[i] = (char)value; }
/* MDDRIVER.C - test driver for MD2, MD4 and MD5 */
/* Copyright (C) 1990-2, RSA Data Security, Inc. Created 1990. All rights reserved.
RSA Data Security, Inc. makes no representations concerning either the merchantability of this software or the suitability of this software for any particular purpose. It is provided "as is" without express or implied warranty of any kind.
These notices must be retained in any copies of any part of this documentation and/or software. */
/* The following makes MD default to MD5 if it has not already been defined with C compiler flags. */ #ifndef MD #define MD MD5 #endif
Arguments (may be any combination): -sstring - digests string -t - runs time trial -x - runs test script filename - digests file (none) - digests standard input */ int main (argc, argv) int argc;
Rivest [Page 17]
RFC 1321 MD5 Message-Digest Algorithm April 1992
char *argv[]; { int i;
if (argc > 1) for (i = 1; i < argc; i++) if (argv[i][0] == '-' && argv[i][1] == 's') MDString (argv[i] + 2); else if (strcmp (argv[i], "-t") == 0) MDTimeTrial (); else if (strcmp (argv[i], "-x") == 0) MDTestSuite (); else MDFile (argv[i]); else MDFilter ();
return (0); }
/* Digests a string and prints the result. */ static void MDString (string) char *string; { MD_CTX context; unsigned char digest[16]; unsigned int len = strlen (string);
The level of security discussed in this memo is considered to be sufficient for implementing very high security hybrid digital- signature schemes based on MD5 and a public-key cryptosystem.
Author's Address
Ronald L. Rivest Massachusetts Institute of Technology Laboratory for Computer Science NE43-324 545 Technology Square Cambridge, MA 02139-1986