Network Working Group G. Klyne Request for Comments: 3862 Nine by Nine Category: Standards Track D. Atkins IHTFP Consulting August 2004
Common Presence and Instant Messaging (CPIM): Message Format
Status of this Memo
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
Copyright (C) The Internet Society (2004).
This memo defines the MIME content type 'Message/CPIM', a message format for protocols that conform to the Common Profile for Instant Messaging (CPIM) specification.
This memo defines the MIME content type 'Message/CPIM', a message format for protocols that conform to the Common Profile for Instant Messaging (CPIM) specification. This is a common message format for CPIM-compliant messaging protocols .
While being prepared for CPIM, this format is quite general and may be reused by other applications with similar requirements. Application specifications that adopt this as a base format should address the questions raised in section 6 of this document.
The Common Profile for Instant Messaging (CPIM)  specification defines a number of operations to be supported and criteria to be satisfied for interworking between diverse instant messaging protocols. The intent is to allow a variety of different protocols interworking through gateways to support cross-protocol messaging that meets the requirements of RFC 2779 .
To adequately meet the security requirements of RFC 2779, a common message format is needed so that end-to-end signatures and encryption may be applied. This document describes a common canonical message format that must be used by any CPIM-compliant message transfer protocol, whereby signatures are calculated for end-to-end security.
The design of this message format is intended to enable security to be applied, while itself remaining agnostic about the specific security mechanisms that may be appropriate for a given application. For CPIM instant messaging and presence, specific security protocols are specified by the CPIM instant messaging  and CPIM presence  specifications.
Also note that the message format described here is not itself a MIME data format, although it may be contained within a MIME object, and may contain MIME objects. See section 2 for more details.
RFC 2779 requires that an instant message can carry a MIME payload ; thus some level of support for MIME will be a common element of any CPIM compliant protocol. Therefore it seems reasonable that a common message format should use a RFC2822/MIME-like syntax , as protocol implementations must already contain code to parse this.
Unfortunately, using pure RFC2822/MIME can be problematic:
Klyne & Atkins Standards Track [Page 3]
RFC 3862 CPIM: Message Format August 2004
o Irregular lexical structure -- RFC2822/MIME allows a number of optional encodings and multiple ways to encode a particular value. For example, RFC2822/MIME comments may be encoded in multiple ways. For security purposes, a single encoding method must be defined as a basis for computing message digest values. Protocols that transmit data in a different format would otherwise lose information needed to verify a signature.
o Weak internationalization -- RFC2822/MIME requires header values to use 7-bit ASCII, which is problematic for encoding international character sets. Mechanisms for language tagging in RFC2822/MIME headers  are awkward to use and have limited applicability.
o Mutability -- addition, modification or removal of header information. Because it is not explicitly forbidden, many applications that process MIME content (e.g., MIME gateways) rebuild or restructure messages in transit. This obliterates most attempts at achieving security (e.g., signatures), leaving receiving applications unable to verify the data received.
o Message and payload separation -- there is not a clear syntactic distinction between message metadata and message content.
o Limited extensibility. (X-headers are problematic because they may not be standardized; this leads to situations where a header starts out as experimental but then finds widespread application, resulting in a common usage that cannot be standardized.)
o No support for structured information (text string values only).
o Some processors impose line length limitations.
The message format defined by this memo overcomes some of these difficulties by having a simplified syntax that is generally compatible with the format accepted by RFC2822/MIME parsers and having a stricter syntax. It also defines mechanisms to support some desired features not covered by the RFC2822/MIME format specifications.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 .
NOTE: Comments like this provide additional nonessential information about the rationale behind this document. Such information is not needed for building a conformant implementation, but may help those who wish to understand the design in greater depth.
The CPIM message format encapsulates arbitrary MIME message content, together with message- and content-related metadata. This can optionally be signed or encrypted using MIME security multiparts in conjunction with an appropriate security scheme.
A Message/CPIM object is a two-part entity, where the first part contains the message metadata and the second part is the message content. The two parts are separated from the enclosing MIME header fields and also from each other by blank lines. The message metadata header information obeys more stringent syntax rules than the MIME message content headers that may be carried within the message.
The end of the message body is defined by the framing mechanism of the protocol used. The tags 'm:', 's:', 'h:', 'e:', and 'x:' are not part of the message format and are used here to indicate the different parts of the message, thus:
m: MIME headers for the overall message s: a blank separator line h: message headers e: encapsulated MIME object containing the message content x: MIME security multipart message wrapper
Message headers carry information relevant to the end-to-end transfer of the message from sender to receiver. Message headers MUST NOT be modified, reformatted or reordered in transit, but in some circumstances they MAY be examined by a CPIM message transfer protocol.
The message headers serve a similar purpose to RFC 2822 message headers in email , and have a similar but restricted allowable syntax.
The basic header syntax is:
where "Key" is a header name and "Value" is the corresponding header value.
The following considerations apply:
o The entire header MUST be contained on a single line. The line terminator is not considered part of the header value.
Klyne & Atkins Standards Track [Page 6]
RFC 3862 CPIM: Message Format August 2004
o Only one header per line. Multiple headers MUST NOT be included on a single line.
o Processors SHOULD NOT impose any line-length limitations.
o There MUST NOT be any whitespace at the beginning or end of a line.
o UTF-8 character encoding  MUST be used throughout.
o The character sequence CR,LF (13,10) MUST be used to terminate each line.
o The header name contains only US-ASCII characters (see section 3.1 and section 3.6 for the specific syntax).
o The header MUST NOT contain any control characters (0-31). If a header value needs to represent control characters then the escape mechanism described below MUST be used.
o There MUST be a single space character (32) following the header name and colon.
o Multiple headers using the same key (header name) are allowed. (Specific header semantics may dictate only one occurrence of any particular header.)
o Header names MUST match exactly (i.e., "From:" and "from:" are different headers).
o If a header name is not recognized or not understood, the header should be ignored. But see also the "Require:" header (section 4.7).
o Interpretation (e.g., equivalence) of header values is dependent on the particular header definition. Message processors MUST preserve all octets of all headers (both name and value) exactly.
o Message processors MUST NOT change the order of message headers.
To: Pooh Bear <im:email@example.com> From: <im:firstname.lastname@example.org> DateTime: 2001-02-02T10:48:54-05:00
This mechanism MUST be used to code control characters in a header, having Unicode code points in the range U+0000 to U+001f or U+007f. (Rather than invent something completely new, the escape mechanism has been adopted from that used by the Java programming language.)
Note that the escape mechanism is applied to a UCS-2 character, NOT to the octets of its UTF-8 coding. Mapping from/to UTF-8 coding is performed without regard for escape sequences or character coding. (The header syntax is defined so that octets corresponding to control characters other than CR and LF do not appear in the output.)
An arbitrary UCS-2 character is escaped using the form:
\ is U+005c (backslash) u is U+0075 (lower case letter U) xxxx is a sequence of exactly four hexadecimal digits (0-9, a-f or A-F) or (U+0030-U+0039, U+0041-U+0046, or U+0061-0066)
The hexadecimal number 'xxxx' is the UCS code-point value of the escaped character.
Further, the following special sequences introduced by "\" are used:
\\ for \ (backslash, U+005c) \" for " (double quote, U+0022) \' for ' (single quote, U+0027) \b for backspace (U+0008) \t for tab (U+0009) \n for linefeed (U+000a) \r for carriage return (U+000d)
When generating messages conformant with this specification:
o The special sequences listed above MUST be used to encode any occurrence of the following characters that appear anywhere in a header: backslash (U+005c), backspace (U+0008), tab (U+0009), linefeed (U+000a) or carriage return (U+000d).
Klyne & Atkins Standards Track [Page 8]
RFC 3862 CPIM: Message Format August 2004
o The special sequence \" MUST be used for any occurrence of a double quote (U+0022) that appears within a string delimited by double quotes.
o The special sequence \' MUST be used for any occurrence of a single quote (U+0027) that appears within a string delimited by single quotes.
o Single- or double-quote characters that delimit a string value MUST NOT be escaped.
o The general escape sequence \uxxxx MUST be used for any other control character (U+0000 to U+0007, U+000b to U+000c, U+000e to U+001f or u+007f) that appears anywhere in a header.
o All other characters MUST NOT be represented using an escape sequence.
When processing a message based on this specification, the escape sequence usage described above MUST be recognized.
Further, any other occurrence of an escape sequence described above SHOULD be recognized and treated as an occurrence of the corresponding Unicode character.
Any backslash ('\') character SHOULD be interpreted as introducing an escape sequence. Any unrecognized escape sequence SHOULD be treated as an instance of the character following the backslash character. An isolated backslash that is the last character of a header SHOULD be ignored.
A header contains two parts, a name and a value, separated by a colon character (':') and single space (32). It is terminated by the sequence CR,LF (13,10).
Headers use UTF-8 character encoding throughout, per RFC 3629 .
NOTE: in the descriptions that follow, header field names and other specified text values MUST be used exactly as given, using exactly the indicated upper- and lower- case letters. In this respect, the ABNF usage differs from RFC 2234 .
NOTE: The range of allowed characters was determined by examination of HTTP and RFC 2822 header name formats and choosing the more restricted. The intent is to allow CPIM headers to follow a syntax that is compatible with the allowed syntax for both RFC 2822  and HTTP  (including HTTP-derived protocols such as SIP ).
A header value has a structure defined by the corresponding header specification. Implementations that use a particular header must adhere to the format and usage rules thus defined when creating or processing a message containing that header.
The other general constraints on header formats MUST also be followed (one line, UTF-8 character encoding, no control characters, etc.)
NOTE: This section defines a framework for header extensibility whose use is optional. If no header extensions are allowed by an application then these structures may never be used.
An application that uses this message format is expected to define the set of headers that are required and allowed for that application. This section defines a header extensibility framework that can be used with any application.
The extensibility framework is based on that provided for XML  by XML namespaces . All headers are associated with a "namespace", which is in turn associated with a globally unique URI.
Within a particular message instance, header names are associated with a particular namespace through the presence or absence of a namespace prefix, which is a leading part of the header name followed by a period ("."); e.g.,
Here, 'prefix' is the header name prefix, 'header-name' is the header name within the namespace associated with 'prefix', and 'header- value' is the value for this header.
In this case, the header name prefix is absent, and the given 'header-name' is associated with a default namespace.
The Message/CPIM media type registration designates a default namespace for any headers that are not more explicitly associated with any namespace. In most cases, this default namespace is all that is needed.
Klyne & Atkins Standards Track [Page 11]
RFC 3862 CPIM: Message Format August 2004
A namespace is identified by a URI. In this usage, the URI is used simply as a globally unique identifier, and there is no requirement that it can be used for any other purpose. Any legal globally unique URI MAY be used to identify a namespace. (By "globally unique", we mean constructed according to some set of rules so that it is reasonable to expect that nobody else will use the same URI for a different purpose.) A URI used as an identifier MUST be a full absolute-URI, per RFC 2396 . (Relative URIs and URI-references containing fragment identifiers MUST NOT be used for this purpose.)
Within a specific message, an 'NS' header is used to declare a namespace prefix and associate it with a URI that identifies a namespace. Following that declaration, within the scope of that message, the combination of namespace prefix and header name indicates a globally unique identifier for the header (consisting of the namespace URI and header name).
This defines a namespace prefix 'MyFeatures' associated with the namespace identifier 'mid:MessageFeatures@id.foo.com'. Subsequently, the prefix indicates that the WackyMessageOption header name referenced is associated with the identified namespace.
A namespace prefix declaration MUST precede any use of that prefix.
With the exception of any application-specific predefined namespace prefixes (see section 6), a namespace prefix is strictly local to the message in which it occurs. The actual prefix used has no global significance. This means that the headers:
xxx.name: value yyy.name: value
in two different messages may have exactly the same effect if namespace prefixes 'xxx' and 'yyy' are associated with the same namespace URI. Thus the following have exactly the same meaning:
Sometimes it is necessary for the sender of a message to insist that some functionality is understood by the recipient. By using the mandatory-to-recognize indicator, a sender is notifying the recipient that it MUST understand the named header or feature in order to properly understand the message.
A header or feature is indicated as being mandatory-to-recognize by a 'Require:' header. For example:
Multiple required header names may be listed in a single 'Require' header, separated by commas.
NOTE: Indiscriminate use of 'Require:' headers could harm interoperability. It is suggested that any implementer who defines required headers also publish the header specifications so other implementations can successfully interoperate.
The 'Require:' header MAY also be used to indicate that some non- header semantics must be implemented by the recipient, even when it does not appear as a header. For example:
might be used to indicate that message content includes characters from the Kanji repertoire, which must be rendered for proper understanding of the message. In this case, the header name is just a token (using header name syntax and namespace association) that indicates some desired behaviour.
The following description of message header syntax uses ABNF, per RFC 2234 . Most of this syntax can be interpreted as defining UCS character sequences or UTF-8 octet sequences. Alternate productions at the end allow for either interpretation.
NOTE: Specified text values MUST be used as given, using exactly the indicated upper- and lower-case letters. In this respect, the ABNF usage here differs from RFC 2234 .
NOTE: the above syntax comes from an older version of UTF-8, and is included for compatibility with UTF-8 software based on the earlier specifications. Applications generating this message format SHOULD generate UTF-8 that matches the more restricted specification in RFC 3629 .
This specification defines a core set of headers that are available for use by applications: an application specification must indicate the headers that may be used, those that must be recognized and those that must appear in any message (see section 6).
The header definitions that follow fall into two categories:
a) those that are part of the CPIM format extensibility framework, and
b) those that have been based on similar headers in RFC 2822 , specified here with corresponding semantics.
Header names and syntax are described without a namespace qualification, and the associated namespace URI is listed as part of the header specification. Any of the namespace associations already mentioned (implied default namespace, explicit default namespace or implied namespace prefix or explicit namespace prefix declaration) may be used to identify the namespace.
all headers defined here are associated with the namespace uri <urn:ietf:params:cpim-headers:>, which is defined according to .
NOTE: Header names and other text MUST be used as given, using exactly the indicated upper- and lower-case letters. In this respect, the ABNF usage here differs from RFC 2234 .
DateTime-header = "DateTime" ": " date-time ; "DateTime" is case-sensitive
Klyne & Atkins Standards Track [Page 18]
RFC 3862 CPIM: Message Format August 2004
(where the syntax of 'date-time' is a profile of ISO8601  defined in "Date and Time on the Internet" )
Description: The 'DateTime' header supplies the date and time at which the sender sent the message.
One purpose of the this header is to provide for protection against a replay attack, by allowing the recipient to know when the message was intended to be sent. The value of the date header is the senders's current time when the message was transmitted, using ISO 8601  date and time format as profiled in "Date and Time on the Internet: Timestamps" .
The following example shows a Message/CPIM message:
m: Content-type: Message/CPIM s: h: From: MR SANDERS <im:email@example.com> h: To: Depressed Donkey <im:firstname.lastname@example.org> h: DateTime: 2000-12-13T13:40:00-08:00 h: Subject: the weather will be fine today h: Subject:;lang=fr beau temps prevu pour aujourd'hui h: NS: MyFeatures <mid:MessageFeatures@id.foo.com> h: Require: MyFeatures.VitalMessageOption h: MyFeatures.VitalMessageOption: Confirmation-requested h: MyFeatures.WackyMessageOption: Use-silly-font s: e: Content-type: text/xml; charset=utf-8 e: Content-ID: <email@example.com> e: e: <body> e: Here is the text of my message. e: </body>
As defined, the 'Message/CPIM' content type uses a default namespace URI 'urn:ietf:params-cpim-headers:', and does not define any other implicit namespace prefixes. Applications that have different requirements should define and register a different MIME media type, specify the required default namespace URI and define any implied namespace prefixes as part of the media type specification.
Klyne & Atkins Standards Track [Page 22]
RFC 3862 CPIM: Message Format August 2004
Applications using this specification must also specify:
o all headers that must be recognized by implementations of the application
o any headers that must be present in all messages created by that application.
o any headers that may appear more than once in a message, and how they are to be interpreted (e.g., how to interpret multiple 'Subject:' headers with different language parameter values).
o Security mechanisms and crytography schemes to be used with the application, including any mandatory-to-implement security provisions.
The goal of providing a definitive message format to which security mechanisms can be applied places some constraints on the design of applications that use this message format:
o Within a network of message transfer agents, an intermediate gateway MUST NOT change the Message/CPIM content in any way. This implies that headers cannot be changed or reordered, transfer encoding cannot be changed, languages cannot be changed, etc.
o Because Message/CPIM messages are immutable, any transfer agent that wants to modify the message should create a new Message/CPIM message with the modified header and with the original message as its content. (This approach is similar to real-world bill-of- lading handling, where each person in the chain attaches a new sheet to the message. Then anyone can validate the original message and see what has changed and who changed it by following the trail of amendments. Another metaphor is including the old message in a new envelope.)
In chosing security mechanisms for an applications, the following IAB survey documents may be helpful:
Subject: Registration of MIME media type Message/CPIM
MIME media type name: message
MIME subtype name: CPIM
Required parameters: (None)
Optional parameters: (None)
Encoding considerations: Intended to be used in 8-bit clean environments, with non- transformative encoding (8-bit or binary, according to the content contained within the message; the CPIM message headers can be handled in an 8-bit text environment).
This content type could be used with a 7-bit transfer environment if appropriate transfer encoding is used. NOTE that for this purpose, enclosed MIME content MUST BE treated as opaque data and encoded accordingly. Any encoding must be reversed before any enclosed MIME content can be accessed.
Security considerations: The content may contain signed data, so any transfer encoding MUST BE exactly reversed before the content is processed.
See also the security considerations for email messages (RFC 2822 ).
Interoperability considerations: This content format is intended to be used to exchange possibly- secured messages between different instant messaging protocols. Very strict adherence to the message format (including whitespace usage) may be needed to achieve interoperability.
Index value: The index value is a CPIM message header name, which may consist of a sequence from a restricted set of US-ASCII characters, as defined above.
URN Formation: The URI for a header is formed from its name by:
a) replacing any non-URN characters (as defined by RFC 2141 ) with the corresponding '%hh' escape sequence (per RFC 2396 ); and
b) prepending the resulting string with 'urn:ietf:params:cpim- headers:'.
Thus, the URI corresponding to the CPIM message header 'From:' would be 'urn:ietf:params:cpim-headers:From'. The URI corresponding to the (putative) CPIM message header 'Top&Tail' would be 'urn:ietf:params:cpim-headers:Top%26Tail'.
The Message/CPIM format is designed with security in mind. In particular it is designed to be used with MIME security multiparts for signatures and encryption. To this end, Message/CPIM messages must be considered immutable once created.
Because Message/CPIM messages are binary messages (due to UTF-8 encoding), if they are transmitted across non-8-bit-clean transports then the transfer agent must tunnel the entire message. Changing the message data encoding is not an option. This implies that the Message/CPIM must be encapsulated by the message transfer system and unencapsulated at the receiving end of the tunnel.
The resulting message must not have data loss due to the encoding and unencoding of the message. For example, an application may choose to apply the MIME base64 content-transfer-encoding to the Message/CPIM object to meet this requirement.
 Galvin, J., Murphy, S., Crocker, S., and N. Freed, "Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted", RFC 1847, October 1995.
 Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson, R., Crispin, M., and P. Svanberg, "The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996", RFC 2130, April 1997.
 Freed, N. and K. Moore, "MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations", RFC 2231, November 1997.
Klyne & Atkins Standards Track [Page 27]
RFC 3862 CPIM: Message Format August 2004
 Callas, J., Donnerhacke, L., Finney, H., and R. Thayer, "OpenPGP Message Format", RFC 2440, November 1998.
 Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
 Ramsdell, B., Ed., "S/MIME Version 3 Message Specification", RFC 2633, June 1999.
 Day, M., Aggarwal, S., Mohr, G., and J. Vincent, "Instant Messaging / Presence Protocol Requirements", RFC 2779, February 2000.
 Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.
 International Organization for Standardization, "Data elements and interchange formats - Information interchange - Representation of dates and times", ISO Standard 8601, June 1988.
 International Organization for Standardization, "Information Technology - Universal Multiple-octet coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane", ISO Standard 10646-1, May 1993.
 Peterson, J., "Common Profile for Instant Messaging (CPIM)", RFC 3860, August 2004.
 Peterson, J., "Common Profile for Presence (CPP)", RFC 3859, August 2004.
 Bellovin, S., Kaufman, C., and J. Schiller, "Security Mechanisms for the Internet", RFC 3631, December 2003.
 Rescorla, E., "A Survey of Authentication Mechanisms", Work in Progress, March 2004.
Copyright (C) The Internet Society (2004). All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Funding for the RFC Editor function is currently provided by the Internet Society.