Network Working Group A. Sollaud Request for Comments: 5391 France Telecom Category: Standards Track November 2008
RTP Payload Format for ITU-T Recommendation G.711.1
Status of This Memo
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (c) 2008 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
Abstract
This document specifies a Real-time Transport Protocol (RTP) payload format to be used for the ITU Telecommunication Standardization Sector (ITU-T) G.711.1 audio codec. Two media type registrations are also included.
Sollaud Standards Track [Page 1]
RFC 5391 RTP Payload Format for G.711.1 November 2008
Table of Contents
1. Introduction ....................................................2 2. Background ......................................................2 3. RTP Header Usage ................................................3 4. Payload Format ..................................................4 4.1. Payload Header .............................................4 4.2. Audio Data .................................................5 5. Payload Format Parameters .......................................6 5.1. PCMA-WB Media Type Registration ............................7 5.2. PCMU-WB Media Type Registration ............................8 5.3. Mapping to SDP Parameters ..................................9 5.3.1. Offer-Answer Model Considerations ...................9 5.3.2. Declarative SDP Considerations .....................11 6. G.711 Interoperability .........................................11 7. Congestion Control .............................................12 8. Security Considerations ........................................12 9. IANA Considerations ............................................12 10. References ....................................................13 10.1. Normative References .....................................13 10.2. Informative References ...................................13
The ITU Telecommunication Standardization Sector (ITU-T) Recommendation G.711.1 [ITU-G.711.1] is an embedded wideband extension of the Recommendation G.711 [ITU-G.711] audio codec. This document specifies a payload format for packetization of G.711.1 encoded audio signals into the Real-time Transport Protocol (RTP).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT","RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
G.711.1 is a G.711 embedded wideband speech and audio coding algorithm operating at 64, 80, and 96 kbps. At 64 kbps, G.711.1 is fully interoperable with G.711. Hence, an efficient deployment in existing G.711-based Voice over IP (VoIP) infrastructures is foreseen.
The codec operates on 5-ms frames, and the default sampling rate is 16 kHz. Input and output at 8 kHz are also supported for narrowband modes.
Sollaud Standards Track [Page 2]
RFC 5391 RTP Payload Format for G.711.1 November 2008
The encoder produces an embedded bitstream structured in three layers corresponding to three available bit rates: 64, 80, and 96 kbps. The bitstream can be truncated at the decoder side or by any component of the communication system to adjust, "on the fly", the bit rate to the desired value.
The following table gives more details on these layers.
The format of the RTP header is specified in [RFC3550]. The payload format defined in this document uses the fields of the header in a manner consistent with that specification.
marker (M): G.711.1 does not define anything specific regarding Discontinuous Transmission (DTX), a.k.a. silence suppression. Codec-independent mechanisms may be used, like the generic comfort-noise payload format defined in [RFC3389].
For applications that send either no packets or occasional comfort-noise packets during silence, the first packet of a talkspurt -- that is, the first packet after a silence period during which packets have not been transmitted contiguously --
Sollaud Standards Track [Page 3]
RFC 5391 RTP Payload Format for G.711.1 November 2008
SHOULD be distinguished by setting the marker bit in the RTP data header to one. The marker bit in all other packets is zero. The beginning of a talkspurt MAY be used to adjust the playout delay to reflect changing network delays. Applications without silence suppression MUST set the marker bit to zero.
payload type (PT): The assignment of an RTP payload type for this packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile under which this payload format is being used will assign a payload type for this codec or specify that the payload type is to be bound dynamically (see Section 5.3).
timestamp: The RTP timestamp clock frequency is the same as the default sampling frequency: 16 kHz.
G.711.1 has also the capability to operate with 8-kHz sampled input/output signals. It does not affect the bitstream, and the decoder does not require a priori knowledge about the sampling rate of the original signal at the input of the encoder. Therefore, depending on the implementation and the audio acoustic capabilities of the devices, the input of the encoder and/or the output of the decoder can be configured at 8 kHz; however, a 16-kHz RTP clock rate MUST always be used.
The duration of one frame is 5 ms, corresponding to 80 samples at 16 kHz. Thus, the timestamp is increased by 80 for each consecutive frame.
After this payload header, the consecutive audio frames are packed in order of time, that is, oldest first. All frames MUST be of the same mode, indicated by the MI field of the payload header.
Within a frame, layers are always packed in the same order: L0 then L1 for mode R2a, L0 then L2 for mode R2b, L0 then L1 then L2 for mode R3. This is illustrated below.
Sollaud Standards Track [Page 5]
RFC 5391 RTP Payload Format for G.711.1 November 2008
Only full frames must be considered. So if there is a remainder to the division above, the corresponding remaining bytes in the received payload MUST be ignored.
This section defines the parameters that may be used to configure optional features in the G.711.1 RTP transmission.
Both A-law and mu-law G.711 are supported in the core layer L0, but there is no interoperability between A-law and mu-law. So two media types with the same parameters will be defined: audio/PCMA-WB for A-law core, and audio/PCMU-WB for mu-law core. This is consistent with the audio/PCMA and audio/PCMU media types separation for G.711 audio.
The parameters are defined here as part of the media subtype registrations for the G.711.1 codec. A mapping of the parameters into the Session Description Protocol (SDP) [RFC4566] is also provided for those applications that use SDP. In control protocols that do not use MIME or SDP, the media type parameters must be mapped to the appropriate format used with that control protocol.
Sollaud Standards Track [Page 6]
RFC 5391 RTP Payload Format for G.711.1 November 2008
This registration is done using the template defined in [RFC4288] and following [RFC4855].
Type name: audio
Subtype name: PCMA-WB
Required parameters: none
Optional parameters:
mode-set: restricts the active codec mode set to a subset of all modes. Possible values are a comma-separated list of modes from the set: 1, 2, 3, 4 (see Mode Index in Table 3 of RFC 5391). The modes are listed in order of preference; first is preferred. If mode-set is specified, frames encoded with modes outside of the subset MUST NOT be sent in any RTP payload. If not present, all codec modes are allowed.
ptime: the recommended length of time (in milliseconds) represented by the media in a packet. It should be an integer multiple of 5 ms (the frame size). See Section 6 of RFC 4566.
maxptime: the maximum length of time (in milliseconds) that can be encapsulated in a packet. It should be an integer multiple of 5 ms (the frame size). See Section 6 of RFC 4566.
Encoding considerations: This media type is framed and contains binary data. See Section 4.8 of RFC 4288.
This registration is done using the template defined in [RFC4288] and following [RFC4855].
Type name: audio
Subtype name: PCMU-WB
Required parameters: none
Optional parameters:
mode-set: restricts the active codec mode-set to a subset of all modes. Possible values are a comma-separated list of modes from the set: 1, 2, 3, 4 (see Mode Index in Table 3 of RFC 5391). The modes are listed in order of preference; first is preferred. If mode-set is specified, frames encoded with modes outside of the subset MUST NOT be sent in any RTP payload. If not present, all codec modes are allowed.
ptime: the recommended length of time (in milliseconds) represented by the media in a packet. It should be an integer multiple of 5 ms (the frame size). See Section 6 of RFC 4566.
maxptime: the maximum length of time (in milliseconds) that can be encapsulated in a packet. It should be an integer multiple of 5 ms (the frame size). See Section 6 of RFC 4566.
Encoding considerations: This media type is framed and contains binary data. See Section 4.8 of RFC 4288.
The information carried in the media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [RFC4566], which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing the G.711.1 codec, the mapping is as follows:
o The media type ("audio") goes in SDP "m=" as the media name.
o The media subtype ("PCMA-WB" or "PCMU-WB") goes in SDP "a=rtpmap" as the encoding name. The RTP clock rate in "a=rtpmap" MUST be 16000 for G.711.1.
o The parameter "mode-set" goes in the SDP "a=fmtp" attribute by copying it as a "mode-set=<value>" string.
o The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and "a=maxptime" attributes, respectively.
The following considerations apply when using SDP offer-answer procedures [RFC3264] to negotiate the use of G.711.1 payload in RTP:
o Since G.711.1 is an extension of G.711, the offerer SHOULD announce G.711 support in its "m=audio" line, with G.711.1 preferred. This will allow interoperability with both G.711.1 and G.711-only capable parties. This is done by offering the PCMA media subtype in addition to PCMA-WB, and/or PCMU in addition to PCMU-WB.
Sollaud Standards Track [Page 9]
RFC 5391 RTP Payload Format for G.711.1 November 2008
Below is an example part of such an offer, for A-law:
As a reminder, the payload format for G.711 is defined in Section 4.5.14 of [RFC3551].
o The "mode-set" parameter is bi-directional; i.e., the restricted mode-set applies to media both to be received and sent by the declaring entity. If a mode-set was supplied in the offer, the answerer MUST return either the same mode-set or a subset of this mode-set. The answerer MAY change the preference order. If no mode-set was supplied in the offer, the anwerer MAY return a mode- set to restrict the possible modes. In any case, the mode-set in the answer then applies for both offerer and answerer. The offerer MUST NOT send frames of a mode that has been removed by the answerer.
For multicast sessions, if "mode-set" is supplied in the offer, the answerer SHALL only participate in the session if it supports the offered mode-set.
o The parameters "ptime" and "maxptime" will in most cases not affect interoperability. The SDP offer-answer handling of the "ptime" parameter is described in [RFC3264]. The "maxptime" parameter MUST be handled in the same way.
o Any unknown parameter in an offer MUST be ignored by the receiver and MUST NOT be included in the answer.
Below are some example parts of SDP offer-answer exchanges.
For declarative use of SDP, nothing specific is defined for this payload format. The configuration given by the SDP MUST be used when sending and/or receiving media in the session.
The L0 layer of G.711.1 is fully interoperable with G.711, and is embedded in all modes of G.711.1. This provides an easy G.711.1 - G.711 transcoding process.
A gateway or any other network device receiving a G.711.1 packet can easily extract a G.711-compatible payload, without the need to decode and re-encode the audio signal. It simply has to take the audio data of the payload, and strip the upper layers (L1 and/or L2), if any.
If a G.711.1 packet contains several frames, the concatenation of the L0 layers of each frame will form a G.711-compatible payload.
Sollaud Standards Track [Page 11]
RFC 5391 RTP Payload Format for G.711.1 November 2008
Congestion control for RTP SHALL be used in accordance with [RFC3550] and any appropriate profile (for example, [RFC3551]).
The embedded nature of G.711.1 audio data can be helpful for congestion control, since a coding mode with a lower bit rate can be selected when needed. This property is usable only when multiple modes have been negotiated (either no "mode-set" parameter in the SDP or a "mode-set" with at least two modes).
The number of frames encapsulated in each RTP payload influences the overall bandwidth of the RTP stream, due to the header overhead. Packing more frames in each RTP payload can reduce the number of packets sent and hence the header overhead, at the expense of increased delay and reduced error robustness.
RTP packets using the payload format defined in this specification are subject to the general security considerations discussed in the RTP specification [RFC3550] and any appropriate profile (for example, [RFC3551]).
As this format transports encoded speech/audio, the main security issues include confidentiality, integrity protection, and authentication of the speech/audio itself. The payload format itself does not have any built-in security mechanisms. Any suitable external mechanisms, such as the Secure Real-time Transport Protocol (SRTP) [RFC3711], MAY be used.
This payload format and the G.711.1 encoding do not exhibit any significant non-uniformity in the receiver-end computational load, and thus they are unlikely to pose a denial-of-service threat due to the receipt of pathological datagrams. In addition, they do not contain any type of active content such as scripts.
[ITU-G.711.1] International Telecommunications Union, "Wideband embedded extension for G.711 pulse code modulation", ITU-T Recommendation G.711.1, March 2008.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
[RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and Registration Procedures", BCP 13, RFC 4288, December 2005.
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.
[RFC4855] Casner, S., "Media Type Registration of RTP Payload Formats", RFC 4855, February 2007.