Network Working Group M. Luby Request for Comments: 5053 Digital Fountain Category: Standards Track A. Shokrollahi EPFL M. Watson Digital Fountain T. Stockhammer Nomor Research October 2007
Raptor Forward Error Correction Scheme for Object Delivery
Status of This Memo
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
Abstract
This document describes a Fully-Specified Forward Error Correction (FEC) scheme, corresponding to FEC Encoding ID 1, for the Raptor forward error correction code and its application to reliable delivery of data objects.
Raptor is a fountain code, i.e., as many encoding symbols as needed can be generated by the encoder on-the-fly from the source symbols of a source block of data. The decoder is able to recover the source block from any set of encoding symbols only slightly more in number than the number of source symbols.
The Raptor code described here is a systematic code, meaning that all the source symbols are among the encoding symbols that can be generated.
This document specifies an FEC Scheme for the Raptor forward error correction code for object delivery applications. The concept of an FEC Scheme is defined in [RFC5052] and this document follows the format prescribed there and uses the terminology of that document. Raptor Codes were introduced in [Raptor]. For an overview, see, for example, [CCNC].
The Raptor FEC Scheme is a Fully-Specified FEC Scheme corresponding to FEC Encoding ID 1.
Raptor is a fountain code, i.e., as many encoding symbols as needed can be generated by the encoder on-the-fly from the source symbols of a block. The decoder is able to recover the source block from any set of encoding symbols only slightly more in number than the number of source symbols.
The code described in this document is a systematic code, that is, the original source symbols can be sent unmodified from sender to receiver, as well as a number of repair symbols. For more background on the use of Forward Error Correction codes in reliable multicast, see [RFC3453].
The code described here is identical to that described in [MBMS].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
Figure 2: Encoded Common FEC OTI for Raptor FEC Scheme
NOTE 1: The limit of 2^^45 on the transfer length is a consequence of the limitation on the symbol size to 2^^16-1, the limitation on the number of symbols in a source block to 2^^13, and the
Luby, et al. Standards Track [Page 4]
RFC 5053 Raptor FEC Scheme October 2007
limitation on the number of source blocks to 2^^16. However, the Transfer Length is encoded as a 48-bit field for simplicity.
The following parameters are carried in the Scheme-Specific FEC Object Transmission Information element for this FEC Scheme:
- The number of source blocks (Z)
- The number of sub-blocks (N)
- A symbol alignment parameter (Al)
These parameters are all non-negative integers. The encoded Scheme- specific Object Transmission Information is a 4-octet field consisting of the parameters Z (2 octets), N (1 octet), and Al (1 octet) as shown in Figure 3.
Figure 3: Encoded Scheme-Specific FEC Object Transmission Information
The encoded FEC Object Transmission Information is a 14-octet field consisting of the concatenation of the encoded Common FEC Object Transmission Information and the encoded Scheme-Specific FEC Object Transmission Information.
These three parameters define the source block partitioning as described in Section 5.3.1.2.
This section describes the information exchange between the Raptor FEC Scheme and any Content Delivery Protocol (CDP) that makes use of the Raptor FEC Scheme for object delivery.
The Raptor encoder and decoder for object delivery require the following information from the CDP:
- The transfer length of the object, F, in bytes
Luby, et al. Standards Track [Page 5]
RFC 5053 Raptor FEC Scheme October 2007
- A symbol alignment parameter, Al
- The symbol size, T, in bytes, which MUST be a multiple of Al
- The number of source blocks, Z
- The number of sub-blocks in each source block, N
The Raptor encoder for object delivery additionally requires:
- the object to be encoded, F bytes
The Raptor encoder supplies the CDP with the following information for each packet to be sent:
- Source Block Number (SBN)
- Encoding Symbol ID (ESI)
- Encoding symbol(s)
The CDP MUST communicate this information to the receiver.
This section provides recommendations for the derivation of the three transport parameters, T, Z, and N. This recommendation is based on the following input parameters:
- F the transfer length of the object, in bytes
- W a target on the sub-block size, in bytes
- P the maximum packet payload size, in bytes, which is assumed to be a multiple of Al
- Al the symbol alignment parameter, in bytes
- Kmax the maximum number of source symbols per source block.
- Kmin a minimum target on the number of symbols per source block
- Gmax a maximum target number of symbols per packet
Luby, et al. Standards Track [Page 6]
RFC 5053 Raptor FEC Scheme October 2007
Based on the above inputs, the transport parameters T, Z, and N are calculated as follows:
Let
G = min{ceil(P*Kmin/F), P/Al, Gmax}
T = floor(P/(Al*G))*Al
Kt = ceil(F/T)
Z = ceil(Kt/Kmax)
N = min{ceil(ceil(Kt/Z)*T/W), T/Al}
The value G represents the maximum number of symbols to be transported in a single packet. The value Kt is the total number of symbols required to represent the source data of the object. The values of G and N derived above should be considered as lower bounds. It may be advantageous to increase these values, for example, to the nearest power of two. In particular, the above algorithm does not guarantee that the symbol size, T, divides the maximum packet size, P, and so it may not be possible to use the packets of size exactly P. If, instead, G is chosen to be a value that divides P/Al, then the symbol size, T, will be a divisor of P and packets of size P can be used.
The algorithm above and that defined in Section 5.3.1.2 ensure that the sub-symbol sizes are a multiple of the symbol alignment parameter, Al. This is useful because the XOR operations used for encoding and decoding are generally performed several bytes at a time, for example, at least 4 bytes at a time on a 32-bit processor. Thus, the encoding and decoding can be performed faster if the sub- symbol sizes are a multiple of this number of bytes.
Recommended settings for the input parameters, Al, Kmin, and Gmax are as follows: Al = 4, Kmin = 1024, Gmax = 10.
The parameter W can be used to generate encoded data that can be decoded efficiently with limited working memory at the decoder. Note that the actual maximum decoder memory requirement for a given value of W depends on the implementation, but it is possible to implement decoding using working memory only slightly larger than W.
For the purposes of this specification, the following terms and definitions apply.
Source block: a block of K source symbols that are considered together for Raptor encoding purposes.
Source symbol: the smallest unit of data used during the encoding process. All source symbols within a source block have the same size.
Encoding symbol: a symbol that is included in a data packet. The encoding symbols consist of the source symbols and the repair symbols. Repair symbols generated from a source block have the same size as the source symbols of that source block.
Systematic code: a code in which all the source symbols may be included as part of the encoding symbols sent for a source block.
Repair symbol: the encoding symbols sent for a source block that are not the source symbols. The repair symbols are generated based on the source symbols.
Intermediate symbols: symbols generated from the source symbols using an inverse encoding process . The repair symbols are then generated directly from the intermediate symbols. The encoding symbols do not include the intermediate symbols, i.e., intermediate symbols are not included in data packets.
Symbol: a unit of data. The size, in bytes, of a symbol is known as the symbol size.
Encoding symbol group: a group of encoding symbols that are sent together, i.e., within the same packet whose relationship to the source symbols can be derived from a single Encoding Symbol ID.
Encoding Symbol ID: information that defines the relationship between the symbols of an encoding symbol group and the source symbols.
Encoding packet: data packets that contain encoding symbols
Luby, et al. Standards Track [Page 8]
RFC 5053 Raptor FEC Scheme October 2007
Sub-block: a source block is sometimes broken into sub-blocks, each of which is sufficiently small to be decoded in working memory. For a source block consisting of K source symbols, each sub-block consists of K sub-symbols, each symbol of the source block being composed of one sub-symbol from each sub-block.
Sub-symbol: part of a symbol. Each source symbol is composed of as many sub-symbols as there are sub-blocks in the source block.
Source packet: data packets that contain source symbols.
Repair packet: data packets that contain repair symbols.
i, j, x, h, a, b, d, v, m represent positive integers.
ceil(x) denotes the smallest positive integer that is greater than or equal to x.
choose(i,j) denotes the number of ways j objects can be chosen from among i objects without repetition.
floor(x) denotes the largest positive integer that is less than or equal to x.
i % j denotes i modulo j.
X ^ Y denotes, for equal-length bit strings X and Y, the bitwise exclusive-or of X and Y.
Al denotes a symbol alignment parameter. Symbol and sub-symbol sizes are restricted to be multiples of Al.
A denotes a matrix over GF(2).
Transpose[A] denotes the transposed matrix of matrix A.
A^^-1 denotes the inverse matrix of matrix A.
K denotes the number of symbols in a single source block.
Kmax denotes the maximum number of source symbols that can be in a single source block. Set to 8192.
L denotes the number of pre-coding symbols for a single source block.
Luby, et al. Standards Track [Page 9]
RFC 5053 Raptor FEC Scheme October 2007
S denotes the number of LDPC symbols for a single source block.
H denotes the number of Half symbols for a single source block.
C denotes an array of intermediate symbols, C[0], C[1], C[2],..., C[L-1].
C' denotes an array of source symbols, C'[0], C'[1], C'[2],..., C'[K-1].
X a non-negative integer value
V0, V1 two arrays of 4-byte integers, V0[0], V0[1],..., V0[255] and V1[0], V1[1],..., V1[255]
Rand[X, i, m] a pseudo-random number generator
Deg[v] a degree generator
LTEnc[K, C ,(d, a, b)] a LT encoding symbol generator
Trip[K, X] a triple generator function
G the number of symbols within an encoding symbol group
GF(n) the Galois field with n elements.
N the number of sub-blocks within a source block
T the symbol size in bytes. If the source block is partitioned into sub-blocks, then T = T'*N.
T' the sub-symbol size, in bytes. If the source block is not partitioned into sub-blocks, then T' is not relevant.
F the transfer length of an object, in bytes
I the sub-block size in bytes
P for object delivery, the payload size of each packet, in bytes, that is used in the recommended derivation of the object delivery transport parameters.
Q Q = 65521, i.e., Q is the largest prime smaller than 2^^16
Z the number of source blocks, for object delivery
The principal component of the systematic Raptor code is the basic encoder described in Section 5.4. First, it is described how to derive values for a set of intermediate symbols from the original source symbols such that knowledge of the intermediate symbols is sufficient to reconstruct the source symbols. Secondly, the encoder produces repair symbols, which are each the exclusive OR of a number of the intermediate symbols. The encoding symbols are the combination of the source and repair symbols. The repair symbols are produced in such a way that the intermediate symbols, and therefore also the source symbols, can be recovered from any sufficiently large set of encoding symbols.
This document specifies the systematic Raptor code encoder. A number of possible decoding algorithms are possible. An efficient decoding algorithm is provided in Section 5.5.
The construction of the intermediate and repair symbols is based in part on a pseudo-random number generator described in Section 5.4.4.1. This generator is based on a fixed set of 512 random numbers that MUST be available to both sender and receiver. These are provided in Section 5.6.
Luby, et al. Standards Track [Page 11]
RFC 5053 Raptor FEC Scheme October 2007
Finally, the construction of the intermediate symbols from the source symbols is governed by a 'systematic index', values of which are provided in Section 5.7 for source block sizes from 4 source symbols to Kmax = 8192 source symbols.
In order to apply the Raptor encoder to a source object, the object may be broken into Z >= 1 blocks, known as source blocks. The Raptor encoder is applied independently to each source block. Each source block is identified by a unique integer Source Block Number (SBN), where the first source block has SBN zero, the second has SBN one, etc. Each source block is divided into a number, K, of source symbols of size T bytes each. Each source symbol is identified by a unique integer Encoding Symbol Identifier (ESI), where the first source symbol of a source block has ESI zero, the second has ESI one, etc.
Each source block with K source symbols is divided into N >= 1 sub- blocks, which are small enough to be decoded in the working memory. Each sub-block is divided into K sub-symbols of size T'.
Note that the value of K is not necessarily the same for each source block of an object and the value of T' may not necessarily be the same for each sub-block of a source block. However, the symbol size T is the same for all source blocks of an object and the number of symbols, K, is the same for every sub-block of a source block. Exact partitioning of the object into source blocks and sub-blocks is described in Section 5.3.1.2 below.
The construction of source blocks and sub-blocks is determined based on five input parameters, F, Al, T, Z, and N, and a function Partition[]. The five input parameters are defined as follows:
- F the transfer length of the object, in bytes
- Al a symbol alignment parameter, in bytes
- T the symbol size, in bytes, which MUST be a multiple of Al
- Z the number of source blocks
Luby, et al. Standards Track [Page 12]
RFC 5053 Raptor FEC Scheme October 2007
- N the number of sub-blocks in each source block
These parameters MUST be set so that ceil(ceil(F/T)/Z) <= Kmax. Recommendations for derivation of these parameters are provided in Section 4.2.
The function Partition[] takes a pair of integers (I, J) as input and derives four integers (IL, IS, JL, JS) as output. Specifically, the value of Partition[I, J] is a sequence of four integers (IL, IS, JL, JS), where IL = ceil(I/J), IS = floor(I/J), JL = I - IS * J, and JS = J - JL. Partition[] derives parameters for partitioning a block of size I into J approximately equal-sized blocks. Specifically, JL blocks of length IL and JS blocks of length IS.
The source object MUST be partitioned into source blocks and sub- blocks as follows:
Let
Kt = ceil(F/T)
(KL, KS, ZL, ZS) = Partition[Kt, Z]
(TL, TS, NL, NS) = Partition[T/Al, N]
Then, the object MUST be partitioned into Z = ZL + ZS contiguous source blocks, the first ZL source blocks each having length KL*T bytes, and the remaining ZS source blocks each having KS*T bytes.
If Kt*T > F, then for encoding purposes, the last symbol MUST be padded at the end with Kt*T - F zero bytes.
Next, each source block MUST be divided into N = NL + NS contiguous sub-blocks, the first NL sub-blocks each consisting of K contiguous sub-symbols of size of TL*Al and the remaining NS sub-blocks each consisting of K contiguous sub-symbols of size of TS*Al. The symbol alignment parameter Al ensures that sub-symbols are always a multiple of Al bytes.
Finally, the m-th symbol of a source block consists of the concatenation of the m-th sub-symbol from each of the N sub-blocks. Note that this implies that when N > 1, then a symbol is NOT a contiguous portion of the object.
Each encoding packet contains the following information:
- Source Block Number (SBN)
- Encoding Symbol ID (ESI)
- encoding symbol(s)
Each source block is encoded independently of the others. Source blocks are numbered consecutively from zero.
Encoding Symbol ID values from 0 to K-1 identify the source symbols of a source block in sequential order, where K is the number of symbols in the source block. Encoding Symbol IDs from K onwards identify repair symbols.
Each encoding packet either consists entirely of source symbols (source packet) or entirely of repair symbols (repair packet). A packet may contain any number of symbols from the same source block. In the case that the last source symbol in a source packet includes padding bytes added for FEC encoding purposes, then these bytes need not be included in the packet. Otherwise, only whole symbols MUST be included.
The Encoding Symbol ID, X, carried in each source packet is the Encoding Symbol ID of the first source symbol carried in that packet. The subsequent source symbols in the packet have Encoding Symbol IDs, X+1 to X+G-1, in sequential order, where G is the number of symbols in the packet.
Similarly, the Encoding Symbol ID, X, placed into a repair packet is the Encoding Symbol ID of the first repair symbol in the repair packet and the subsequent repair symbols in the packet have Encoding Symbol IDs X+1 to X+G-1 in sequential order, where G is the number of symbols in the packet.
Note that it is not necessary for the receiver to know the total number of repair packets.
Associated with each symbol is a triple of integers (d, a, b).
The G repair symbol triples (d[0], a[0], b[0]),..., (d[G-1], a[G-1], b[G-1]) for the repair symbols placed into a repair packet with ESI X are computed using the Triple generator defined in Section 5.4.4.4 as follows:
Luby, et al. Standards Track [Page 14]
RFC 5053 Raptor FEC Scheme October 2007
For each i = 0, ..., G-1, (d[i], a[i], b[i]) = Trip[K,X+i]
The G repair symbols to be placed in repair packet with ESI X are calculated based on the repair symbol triples, as described in Section 5.4, using the intermediate symbols C and the LT encoder LTEnc[K, C, (d[i], a[i], b[i])].
The systematic Raptor encoder is used to generate repair symbols from a source block that consists of K source symbols.
Symbols are the fundamental data units of the encoding and decoding process. For each source block (sub-block), all symbols (sub- symbols) are the same size. The atomic operation performed on symbols (sub-symbols) for both encoding and decoding is the exclusive-or operation.
Let C'[0],..., C'[K-1] denote the K source symbols.
Let C[0],..., C[L-1] denote L intermediate symbols.
The first step of encoding is to generate a number, L > K, of intermediate symbols from the K source symbols. In this step, K source symbol triples (d[0], a[0], b[0]), ..., (d[K-1], a[K-1], b[K-1]) are generated using the Trip[] generator as described in Section 5.4.2.2. The K source symbol triples are associated with the K source symbols and are then used to determine the L intermediate symbols C[0],..., C[L-1] from the source symbols using an inverse encoding process. This process can be realized by a Raptor decoding process.
Certain "pre-coding relationships" MUST hold within the L intermediate symbols. Section 5.4.2.3 describes these relationships and how the intermediate symbols are generated from the source symbols.
Once the intermediate symbols have been generated, repair symbols are produced and one or more repair symbols are placed as a group into a single data packet. Each repair symbol group is associated with an Encoding Symbol ID (ESI) and a number, G, of repair symbols. The ESI is used to generate a triple of three integers, (d, a, b) for each repair symbol, again using the Trip[] generator as described in Section 5.4.4.4. Then, each (d,a,b)-triple is used to generate the
Luby, et al. Standards Track [Page 15]
RFC 5053 Raptor FEC Scheme October 2007
corresponding repair symbol from the intermediate symbols using the LTEnc[K, C[0],..., C[L-1], (d,a,b)] generator described in Section 5.4.4.3.
5.4.2. First Encoding Step: Intermediate Symbol Generation
The first encoding step is a pre-coding step to generate the L intermediate symbols C[0], ..., C[L-1] from the source symbols C'[0], ..., C'[K-1]. The intermediate symbols are uniquely defined by two sets of constraints:
1. The intermediate symbols are related to the source symbols by a set of source symbol triples. The generation of the source symbol triples is defined in Section 5.4.2.2 using the Trip[] generator described in Section 5.4.4.4.
2. A set of pre-coding relationships hold within the intermediate symbols themselves. These are defined in Section 5.4.2.3.
The generation of the L intermediate symbols is then defined in Section 5.4.2.4
Each of the K source symbols is associated with a triple (d[i], a[i], b[i]) for 0 <= i < K. The source symbol triples are determined using the Triple generator defined in Section 5.4.4.4 as:
The pre-coding relationships amongst the L intermediate symbols are defined by expressing the last L-K intermediate symbols in terms of the first K intermediate symbols.
The last L-K intermediate symbols C[K],...,C[L-1] consist of S LDPC symbols and H Half symbols The values of S and H are determined from K as described below. Then L = K+S+H.
Luby, et al. Standards Track [Page 16]
RFC 5053 Raptor FEC Scheme October 2007
Let
X be the smallest positive integer such that X*(X-1) >= 2*K.
S be the smallest prime integer such that S >= ceil(0.01*K) + X
H be the smallest integer such that choose(H,ceil(H/2)) >= K + S
H' = ceil(H/2)
L = K+S+H
C[0],...,C[K-1] denote the first K intermediate symbols
C[K],...,C[K+S-1] denote the S LDPC symbols, initialised to zero
C[K+S],...,C[L-1] denote the H Half symbols, initialised to zero
The S LDPC symbols are defined to be the values of C[K],...,C[K+S-1] at the end of the following process:
For i = 0,...,K-1 do
a = 1 + (floor(i/S) % (S-1))
b = i % S
C[K + b] = C[K + b] ^ C[i]
b = (b + a) % S
C[K + b] = C[K + b] ^ C[i]
b = (b + a) % S
C[K + b] = C[K + b] ^ C[i]
The H Half symbols are defined as follows:
Let
g[i] = i ^ (floor(i/2)) for all positive integers i
Note: g[i] is the Gray sequence, in which each element differs from the previous one in a single bit position
m[k] denote the subsequence of g[.] whose elements have exactly k non-zero bits in their binary representation.
Luby, et al. Standards Track [Page 17]
RFC 5053 Raptor FEC Scheme October 2007
m[j,k] denote the jth element of the sequence m[k], where j=0, 1, 2, ...
Then, the Half symbols are defined as the values of C[K+S],...,C[L-1] after the following process:
For h = 0,...,H-1 do
For j = 0,...,K+S-1 do
If bit h of m[j,H'] is equal to 1 then C[h+K+S] = C[h+K+S] ^ C[j].
Given the K source symbols C'[0], C'[1],..., C'[K-1] the L intermediate symbols C[0], C[1],..., C[L-1] are the uniquely defined symbol values that satisfy the following conditions:
1. The K source symbols C'[0], C'[1],..., C'[K-1] satisfy the K constraints
C'[i] = LTEnc[K, (C[0],..., C[L-1]), (d[i], a[i], b[i])], for all i, 0 <= i < K.
2. The L intermediate symbols C[0], C[1],..., C[L-1] satisfy the pre-coding relationships defined in Section 5.4.2.3.
5.4.2.4.2. Example Method for Calculation of Intermediate Symbols
This subsection describes a possible method for calculation of the L intermediate symbols C[0], C[1],..., C[L-1] satisfying the constraints in Section 5.4.2.4.1.
The 'generator matrix' for a code that generates N output symbols from K input symbols is an NxK matrix over GF(2), where each row corresponds to one of the output symbols and each column to one of the input symbols and where the ith output symbol is equal to the sum of those input symbols whose column contains a non-zero entry in row i.
Luby, et al. Standards Track [Page 18]
RFC 5053 Raptor FEC Scheme October 2007
Then, the L intermediate symbols can be calculated as follows:
Let
C denote the column vector of the L intermediate symbols, C[0], C[1],..., C[L-1].
D denote the column vector consisting of S+H zero symbols followed by the K source symbols C'[0], C'[1], ..., C'[K-1]
Then the above constraints define an LxL matrix over GF(2), A, such that:
A*C = D
The matrix A can be constructed as follows:
Let:
G_LDPC be the S x K generator matrix of the LDPC symbols. So,
i.e., G_LT(i,j) = 1 if and only if C[j] is included in the symbols that are XORed to produce LTEnc[K, (C[0], ..., C[L-1]), (d[i], a[i], b[i])].
Then:
The first S rows of A are equal to G_LDPC | I_S | 0_SxH.
Luby, et al. Standards Track [Page 19]
RFC 5053 Raptor FEC Scheme October 2007
The next H rows of A are equal to G_Half | I_H.
The remaining K rows of A are equal to G_LT.
The matrix A is depicted in Figure 4 below:
K S H +-----------------------+-------+-------+ | | | | S | G_LDPC | I_S | 0_SxH | | | | | +-----------------------+-------+-------+ | | | H | G_Half | I_H | | | | +-------------------------------+-------+ | | | | K | G_LT | | | | | +---------------------------------------+
Figure 4: The matrix A
The intermediate symbols can then be calculated as:
C = (A^^-1)*D
The source symbol triples are generated such that for any K matrix, A has full rank and is therefore invertible. This calculation can be realized by applying a Raptor decoding process to the K source symbols C'[0], C'[1],..., C'[K-1] to produce the L intermediate symbols C[0], C[1],..., C[L-1].
To efficiently generate the intermediate symbols from the source symbols, it is recommended that an efficient decoder implementation such as that described in Section 5.5 be used. The source symbol triples are designed to facilitate efficient decoding of the source symbols using that algorithm.
In the second encoding step, the repair symbol with ESI X is generated by applying the generator LTEnc[K, (C[0], C[1],..., C[L-1]), (d, a, b)] defined in Section 5.4.4.3 to the L intermediate symbols C[0], C[1],..., C[L-1] using the triple (d, a, b)=Trip[K,X] generated according to Section 5.3.2
The random number generator Rand[X, i, m] is defined as follows, where X is a non-negative integer, i is a non-negative integer, and m is a positive integer and the value produced is an integer between 0 and m-1. Let V0 and V1 be arrays of 256 entries each, where each entry is a 4-byte unsigned integer. These arrays are provided in Section 5.6.
Then,
Rand[X, i, m] = (V0[(X + i) % 256] ^ V1[(floor(X/256)+ i) % 256]) % m
The encoding symbol generator LTEnc[K, (C[0], C[1],..., C[L-1]), (d, a, b)] takes the following inputs:
Luby, et al. Standards Track [Page 21]
RFC 5053 Raptor FEC Scheme October 2007
K is the number of source symbols (or sub-symbols) for the source block (sub-block). Let L be derived from K as described in Section 5.4.2.3, and let L' be the smallest prime integer greater than or equal to L.
(C[0], C[1],..., C[L-1]) is the array of L intermediate symbols (sub-symbols) generated as described in Section 5.4.2.4.
(d, a, b) is a source triple determined using the Triple generator defined in Section 5.4.4.4, whereby
d is an integer denoting an encoding symbol degree
a is an integer between 1 and L'-1 inclusive
b is an integer between 0 and L'-1 inclusive
The encoding symbol generator produces a single encoding symbol as output, according to the following algorithm:
This section describes an efficient decoding algorithm for the Raptor codes described in this specification. Note that each received encoding symbol can be considered as the value of an equation amongst the intermediate symbols. From these simultaneous equations, and the known pre-coding relationships amongst the intermediate symbols, any algorithm for solving simultaneous equations can successfully decode the intermediate symbols and hence the source symbols. However, the algorithm chosen has a major effect on the computational efficiency of the decoding.
It is assumed that the decoder knows the structure of the source block it is to decode, including the symbol size, T, and the number K of symbols in the source block.
From the algorithms described in Section 5.4, the Raptor decoder can calculate the total number L = K+S+H of pre-coding symbols and determine how they were generated from the source block to be decoded. In this description, it is assumed that the received
Luby, et al. Standards Track [Page 23]
RFC 5053 Raptor FEC Scheme October 2007
encoding symbols for the source block to be decoded are passed to the decoder. Note that, as described in Section 5.3.2, the last source symbol of a source packet may have included padding bytes added for FEC encoding purposes. These padding bytes may not be actually included in the packet sent and so must be reinserted at the received before passing the symbol to the decoder.
For each such encoding symbol, it is assumed that the number and set of intermediate symbols whose exclusive-or is equal to the encoding symbol is also passed to the decoder. In the case of source symbols, the source symbol triples described in Section 5.4.2.2 indicate the number and set of intermediate symbols that sum to give each source symbol.
Let N >= K be the number of received encoding symbols for a source block and let M = S+H+N. The following M by L bit matrix A can be derived from the information passed to the decoder for the source block to be decoded. Let C be the column vector of the L intermediate symbols, and let D be the column vector of M symbols with values known to the receiver, where the first S+H of the M symbols are zero-valued symbols that correspond to LDPC and Half symbols (these are check symbols for the LDPC and Half symbols, and not the LDPC and Half symbols themselves), and the remaining N of the M symbols are the received encoding symbols for the source block. Then, A is the bit matrix that satisfies A*C = D, where here * denotes matrix multiplication over GF[2]. In particular, A[i,j] = 1 if the intermediate symbol corresponding to index j is exclusive-ORed into the LDPC, Half, or encoding symbol corresponding to index i in the encoding, or if index i corresponds to a LDPC or Half symbol and index j corresponds to the same LDPC or Half symbol. For all other i and j, A[i,j] = 0.
Decoding a source block is equivalent to decoding C from known A and D. It is clear that C can be decoded if and only if the rank of A over GF[2] is L. Once C has been decoded, missing source symbols can be obtained by using the source symbol triples to determine the number and set of intermediate symbols that MUST be exclusive-ORed to obtain each missing source symbol.
The first step in decoding C is to form a decoding schedule. In this step A is converted, using Gaussian elimination (using row operations and row and column reorderings) and after discarding M - L rows, into the L by L identity matrix. The decoding schedule consists of the sequence of row operations and row and column reorderings during the Gaussian elimination process, and only depends on A and not on D. The decoding of C from D can take place concurrently with the forming of the decoding schedule, or the decoding can take place afterwards based on the decoding schedule.
Luby, et al. Standards Track [Page 24]
RFC 5053 Raptor FEC Scheme October 2007
The correspondence between the decoding schedule and the decoding of C is as follows. Let c[0] = 0, c[1] = 1,...,c[L-1] = L-1 and d[0] = 0, d[1] = 1,...,d[M-1] = M-1 initially.
- Each time row i of A is exclusive-ORed into row i' in the decoding schedule, then in the decoding process, symbol D[d[i]] is exclusive-ORed into symbol D[d[i']].
- Each time row i is exchanged with row i' in the decoding schedule, then in the decoding process, the value of d[i] is exchanged with the value of d[i'].
- Each time column j is exchanged with column j' in the decoding schedule, then in the decoding process, the value of c[j] is exchanged with the value of c[j'].
From this correspondence, it is clear that the total number of exclusive-ORs of symbols in the decoding of the source block is the number of row operations (not exchanges) in the Gaussian elimination. Since A is the L by L identity matrix after the Gaussian elimination and after discarding the last M - L rows, it is clear at the end of successful decoding that the L symbols D[d[0]], D[d[1]],..., D[d[L-1]] are the values of the L symbols C[c[0]], C[c[1]],..., C[c[L-1]].
The order in which Gaussian elimination is performed to form the decoding schedule has no bearing on whether or not the decoding is successful. However, the speed of the decoding depends heavily on the order in which Gaussian elimination is performed. (Furthermore, maintaining a sparse representation of A is crucial, although this is not described here). The remainder of this section describes an order in which Gaussian elimination could be performed that is relatively efficient.
The first phase of the Gaussian elimination, the matrix A, is conceptually partitioned into submatrices. The submatrix sizes are parameterized by non-negative integers i and u, which are initialized to 0. The submatrices of A are:
(1) The submatrix I defined by the intersection of the first i rows and first i columns. This is the identity matrix at the end of each step in the phase.
(2) The submatrix defined by the intersection of the first i rows and all but the first i columns and last u columns. All entries of this submatrix are zero.
Luby, et al. Standards Track [Page 25]
RFC 5053 Raptor FEC Scheme October 2007
(3) The submatrix defined by the intersection of the first i columns and all but the first i rows. All entries of this submatrix are zero.
(4) The submatrix U defined by the intersection of all the rows and the last u columns.
(5) The submatrix V formed by the intersection of all but the first i columns and the last u columns and all but the first i rows.
Figure 5 illustrates the submatrices of A. At the beginning of the first phase, V = A. In each step, a row of A is chosen.
+-----------+-----------------+---------+ | | | | | I | All Zeros | | | | | | +-----------+-----------------+ U | | | | | | | | | | All Zeros | V | | | | | | | | | | +-----------+-----------------+---------+
Figure 5: Submatrices of A in the first phase
The following graph defined by the structure of V is used in determining which row of A is chosen. The columns that intersect V are the nodes in the graph, and the rows that have exactly 2 ones in V are the edges of the graph that connect the two columns (nodes) in the positions of the two ones. A component in this graph is a maximal set of nodes (columns) and edges (rows) such that there is a path between each pair of nodes/edges in the graph. The size of a component is the number of nodes (columns) in the component.
There are at most L steps in the first phase. The phase ends successfully when i + u = L, i.e., when V and the all-zeroes submatrix above V have disappeared and A consists of I, the all zeroes submatrix below I, and U. The phase ends unsuccessfully in decoding failure if, at some step before V disappears, there is no non-zero row in V to choose in that step. Whenever there are non- zero rows in V, then the next step starts by choosing a row of A as follows:
Luby, et al. Standards Track [Page 26]
RFC 5053 Raptor FEC Scheme October 2007
o Let r be the minimum integer such that at least one row of A has exactly r ones in V.
* If r != 2, then choose a row with exactly r ones in V with minimum original degree among all such rows.
* If r = 2, then choose any row with exactly 2 ones in V that is part of a maximum size component in the graph defined by V.
After the row is chosen in this step the first row of A that intersects V is exchanged with the chosen row so that the chosen row is the first row that intersects V. The columns of A among those that intersect V are reordered so that one of the r ones in the chosen row appears in the first column of V and so that the remaining r-1 ones appear in the last columns of V. Then, the chosen row is exclusive-ORed into all the other rows of A below the chosen row that have a one in the first column of V. Finally, i is incremented by 1 and u is incremented by r-1, which completes the step.
The submatrix U is further partitioned into the first i rows, U_upper, and the remaining M - i rows, U_lower. Gaussian elimination is performed in the second phase on U_lower to either determine that its rank is less than u (decoding failure) or to convert it into a matrix where the first u rows is the identity matrix (success of the second phase). Call this u by u identity matrix I_u. The M - L rows of A that intersect U_lower - I_u are discarded. After this phase, A has L rows and L columns.
After the second phase, the only portion of A that needs to be zeroed out to finish converting A into the L by L identity matrix is U_upper. The number of rows i of the submatrix U_upper is generally much larger than the number of columns u of U_upper. To zero out U_upper efficiently, the following precomputation matrix U' is computed based on I_u in the third phase and then U' is used in the fourth phase to zero out U_upper. The u rows of Iu are partitioned into ceil(u/8) groups of 8 rows each. Then, for each group of 8 rows, all non-zero combinations of the 8 rows are computed, resulting in 2^^8 - 1 = 255 rows (this can be done with 2^^8-8-1 = 247 exclusive-ors of rows per group, since the combinations of Hamming weight one that appear in I_u do not need to be recomputed). Thus, the resulting precomputation matrix U' has ceil(u/8)*255 rows and u columns. Note that U' is not formally a part of matrix A, but will be used in the fourth phase to zero out U_upper.
For each of the first i rows of A, for each group of 8 columns in the U_upper submatrix of this row, if the set of 8 column entries in U_upper are not all zero, then the row of the precomputation matrix U' that matches the pattern in the 8 columns is exclusive-ORed into the row, thus zeroing out those 8 columns in the row at the cost of exclusive-ORing one row of U' into the row.
After this phase, A is the L by L identity matrix and a complete decoding schedule has been successfully formed. Then, as explained in Section 5.5.2.1, the corresponding decoding consisting of exclusive-ORing known encoding symbols can be executed to recover the intermediate symbols based on the decoding schedule. The triples associated with all source symbols are computed according to Section 5.4.2.2. The triples for received source symbols are used in the decoding. The triples for missing source symbols are used to determine which intermediate symbols need to be exclusive-ORed to recover the missing source symbols.
For each value of K, the systematic index J(K) is designed to have the property that the set of source symbol triples (d[0], a[0], b[0]), ..., (d[L-1], a[L-1], b[L-1]) are such that the L intermediate symbols are uniquely defined, i.e., the matrix A in Section 5.4.2.4.2 has full rank and is therefore invertible.
The following is the list of the systematic indices for values of K between 4 and 8192 inclusive.
Data delivery can be subject to denial-of-service attacks by attackers that send corrupted packets that are accepted as legitimate by receivers. This is particularly a concern for multicast delivery because a corrupted packet may be injected into the session close to the root of the multicast tree, in which case, the corrupted packet will arrive at many receivers. This is particularly a concern when the code described in this document is used because the use of even one corrupted packet containing encoding data may result in the decoding of an object that is completely corrupted and unusable. It is thus RECOMMENDED that source authentication and integrity checking are applied to decoded objects before delivering objects to an application. For example, a SHA-1 hash [SHA1] of an object may be appended before transmission, and the SHA-1 hash is computed and checked after the object is decoded but before it is delivered to an application. Source authentication SHOULD be provided, for example, by including a digital signature verifiable by the receiver computed on top of the hash value. It is also RECOMMENDED that a packet authentication protocol, such as TESLA [RFC4082], be used to detect and discard corrupted packets upon arrival. This method may also be used to provide source authentication. Furthermore, it is RECOMMENDED that Reverse Path Forwarding checks be enabled in all network routers and switches along the path from the sender to receivers to limit the possibility of a bad agent successfully injecting a corrupted packet into the multicast tree data path.
Another security concern is that some FEC information may be obtained by receivers out-of-band in a session description, and if the session description is forged or corrupted, then the receivers will not use the correct protocol for decoding content from received packets. To avoid these problems, it is RECOMMENDED that measures be taken to prevent receivers from accepting incorrect session descriptions, e.g., by using source authentication to ensure that receivers only accept legitimate session descriptions from authorized senders.
Values of FEC Encoding IDs and FEC Instance IDs are subject to IANA registration. For general guidelines on IANA considerations as they apply to this document, see [RFC5052]. This document assigns the Fully-Specified FEC Encoding ID 1 under the ietf:rmt:fec:encoding name-space to "Raptor Code".
Numerous editorial improvements and clarifications were made to this specification during the review process within 3GPP. Thanks are due to the members of 3GPP Technical Specification Group SA, Working Group 4, for these.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4082] Perrig, A., Song, D., Canetti, R., Tygar, J., and B. Briscoe, "Timed Efficient Stream Loss-Tolerant Authentication (TESLA): Multicast Source Authentication Transform Introduction", RFC 4082, June 2005.
[RFC5052] Watson, M., Luby, M., and L. Vicisano, "Forward Error Correction (FEC) Building Block", RFC 5052, August 2007.
[CCNC] Luby, M., Watson, M., Gasiba, T., Stockhammer, T., and W. Xu, "Raptor Codes for Reliable Download Delivery in Wireless Broadcast Systems", CCNC 2006, Las Vegas, NV , Jan 2006.
[MBMS] 3GPP, "Multimedia Broadcast/Multicast Service (MBMS); Protocols and codecs", 3GPP TS 26.346 6.1.0, June 2005.
[RFC3453] Luby, M., Vicisano, L., Gemmell, J., Rizzo, L., Handley, M., and J. Crowcroft, "The Use of Forward Error Correction (FEC) in Reliable Multicast", RFC 3453, December 2002.
[Raptor] Shokrollahi, A., "Raptor Codes", IEEE Transactions on Information Theory no. 6, June 2006.
[SHA1] "Secure Hash Standard", Federal Information Processing Standards Publication (FIPS PUB) 180-1, April 2005.
Luby, et al. Standards Track [Page 44]
RFC 5053 Raptor FEC Scheme October 2007
Authors' Addresses
Michael Luby Digital Fountain 39141 Civic Center Drive Suite 300 Fremont, CA 94538 U.S.A.
Mark Watson Digital Fountain 39141 Civic Center Drive Suite 300 Fremont, CA 94538 U.S.A.
EMail: mark@digitalfountain.com
Thomas Stockhammer Nomor Research Brecherspitzstrasse 8 Munich 81541 Germany
EMail: stockhammer@nomor.de
Luby, et al. Standards Track [Page 45]
RFC 5053 Raptor FEC Scheme October 2007
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.