Network Working Group M. Garcia-Martin Request for Comments: 5112 Nokia Siemens Networks Category: Standards Track January 2008
The Presence-Specific Static Dictionary for Signaling Compression (Sigcomp)
Status of This Memo
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
Abstract
The Session Initiation Protocol (SIP) is a text-based protocol for initiating and managing communication sessions. The protocol is extended by the SIP-events notification framework to provide subscriptions and notifications of SIP events. One example of such event notification mechanism is presence, which is expressed in XML documents called presence documents. SIP can be compressed by using Signaling Compression (SigComp), which is enhanced by using the SIP/ Session Description Protocol (SDP) dictionary to achieve better compression rates. However, the SIP/SDP dictionary is not able to increase the compression factor of (typically lengthy) presence documents. This memo defines the presence-specific static dictionary that SigComp can use in order to compress presence documents to achieve higher efficiency. The dictionary is compression-algorithm independent.
Garcia-Martin Standards Track [Page 1]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
Table of Contents
1. Introduction ....................................................2 2. Terminology .....................................................3 3. Design Considerations ...........................................3 4. Binary Representation of the Presence-Specific Static Dictionary ......................................................5 5. Security Considerations ........................................12 6. Acknowledgements ...............................................12 Appendix A. Input Strings to the Presence-Specific Static Dictionary......................................13 References ........................................................22 Normative References ...........................................22 Informative References .........................................22
The Session Initiation Protocol (SIP) [4] is extended by the SIP-events framework [5] to provide subscriptions and notifications of SIP events. One example of such an event-notification mechanism is presence. The presence information is typically carried in Extensible Markup Language (XML) [22] documents that are compliant with a given XML schema [23]. The Presence Information Data Format (PIDF) [8] defines the format for the basic presence document that supplies presence information. Typically, PIDF is used in combination with other extensions to provide a richer user experience, among others: the Presence Data Model [10], Rich Presence Extensions to PIDF (RPID) [11], Contact Information in PIDF (CIPID) [12], the SIP Event Notification Extension for Resource Lists [19] and the SIP User Agent Capability Extensions to PIDF [20], or the Location Object in PIDF [16].
Typically, presence documents can contain large amounts of data. The size of this data is dependent on the number of presentities that a watcher is subscribed to and the amount of information supplied by the presentity. This can impose a problem in environments where resources are scarce (e.g., low bandwidth links with high latency) and the presence service is offered at low or no cost. This is the case, e.g., of some wireless networks and devices. It is reasonable to try to minimize the impact of bringing the presence service to wireless networks under these circumstances.
Work has been done to mitigate the impact of transferring large amounts of presence documents between endpoints. For example, the Partial PIDF [15] reduces the amount of data transferred between the endpoints.
Garcia-Martin Standards Track [Page 2]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
On the other hand, the signaling compression mechanisms specified in the SigComp framework (RFC 3320) [2] provide a multiple compression/ decompression algorithm framework to compress and decompress text-based protocols, such as SIP. When compression is used in SIP, the compression achieves its maximum rate once a few message exchanges have taken place. This is due to the fact that the first message the compressor sends to the decompressor is only partially compressed, as there is not a previously stored state to compress against. As the goal is to compress as much as possible, it seems sensible to investigate a mechanism to boost the compression rate from the first message.
RFC 3485 [7] defines a static dictionary for SIP [4] and SDP [9]. The dictionary is to be used in conjunction with SIP [4], SDP [9], and SigComp [2]. The static SIP/SDP dictionary constitutes a SigComp state that can be referenced in the first SIP message that the compressor sends out. The dictionary boosts the compression of SIP and SDP, but unfortunately does not have any effect in XML-based presence documents.
It sounds reasonable to define a presence-specific static dictionary that can be used in conjunction with SIP and Sigcomp. This dictionary can coexist with the static SIP/SDP dictionary defined in RFC 3485 [7]. Sigcomp endpoints will initially announce the availability of one or both dictionaries until the other end acknowledges that it has received the announcement.
Our initial simulations when developing this dictionary reveal that once the current mitigation mechanisms are applied (e.g., Sigcomp, partial notification, partial publication), a further compression factor of 10% can be achieved when Sigcomp uses the presence-specific static dictionary.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [1] and indicate requirement levels for compliant implementations.
The presence-specific static dictionary is a collection of well-known strings that appear in most of the presence documents used by SIP. The dictionary is not a comprehensive list of reserved words, but it includes many of the strings that appear in presence documents.
Garcia-Martin Standards Track [Page 3]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
The presence static dictionary is unique and MAY be available in SigComp implementations for SIP that support the presence service. The dictionary is not intended to evolve as presence evolves. It is defined once, and it stays as is forever. This solves the problems of updating, upgrading, and finding out the dictionary that is supported at the remote end when several versions of the same dictionary coexist.
Appendix A contains the collection of strings that were contributed to the presence static dictionary. The appendix also includes references to the documents that define those strings.
While this appendix is of an informative nature, Section 4 gives the normative binary form of the presence-specific static dictionary. This is the dictionary that is included in the SigComp implementation. This dictionary has been formed from the collection of individual dictionaries given in Appendix A.
The input set is a collection of UTF-8 [6] encoded character strings. The appendix provides a table where each row represents an entry. Each entry contains the string that actually occurs in the dictionary, its priority (see below), its offset from the first octet and its length (both in hexadecimal), and one or more references that elucidate why this string is expected to occur in presence documents.
Note: Length in this document always refers to octets.
The columns in the table are described as follows:
String: represents the UTF-8 string that is inserted into the dictionary. Note that the quotes (") are not part of the string itself.
Pr: indicates the priority of this string within the dictionary. Some compression algorithms, such as DEFLATE [3], offer an increased efficiency when the most commonly used strings are located at the bottom of the dictionary. To facilitate generating a dictionary that has the most frequently occurring strings farther down at the bottom, we have decided to allocate a priority to each string in the dictionary. Priorities range from 1 to 5. A low value in the priority column (e.g., 1) indicates that we believe there is a high probability of finding the string in a presence document. A high value in the priority column (e.g., 5) indicates lower probability of finding the string in a presence document. This is typically the case for less frequent extensions or optional, infrequent XML elements or attributes.
Garcia-Martin Standards Track [Page 4]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
Off: indicates the hexadecimal offset of the entry with respect to the first octet in the dictionary. Note that several strings in the collections can share space in the dictionary if they exhibit suitable common substrings.
Len: the length of the string in octets in hexadecimal.
References: contains one or more references to the specification and the section within the specification where the string is defined. Note that the strings stored in the dictionary are case sensitive. (Again, the strings do not include the quotes ("), they are just shown here to increase the readability).
There are a few design considerations that require a bit more explanation:
o Due to the fact that most compression algorithms have a break-even point around three or four characters, we have selected those static strings of characters that consist of four or more characters.
o When a string appears as an XML element in an XML document, it is typically surrounded by the '<' and '>' signs, such as in '<foo>'. It would have been natural to include the '<' and '>' signs of the element in each input string. However, we made the decision to omit the '<' and '>' signs because then we can easily reuse the same string for start-tags (e.g., <foo>), start-tags that contain attributes (e.g., <foo attr="myattr">), empty-element tags (e.g., <foo/>), and end-tags (e.g., </foo>).
o Whenever there is an enumerated string, the string does not contain quotes, following the same pattern as any other input string.
o In a few cases, we have decided to split a string that appears a few times into a few substrings. This is the case of Uniform Resource Names (URNs) in the IETF address space, because this allows the dictionary to reuse the same substring in various URN strings.
4. Binary Representation of the Presence-Specific Static Dictionary
This section contains the binary form of the presence-specific static dictionary that is loaded into SigComp as a state.
Garcia-Martin Standards Track [Page 5]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
The binary SigComp dictionary is composed of two parts, the concatenation of which serves as the state value of the state item: A string subset, which contains all strings in the contributing collections as a substring (roughly ordered such that strings with low priority numbers occur at the end), and a table subset, which contains pairs of length and offset values for all the strings in the contributing collections. In each of these pairs, the length is stored as a one-byte value, and the offset is stored as a two-byte value that has had 1024 added to the offset (this allows direct referencing from the stored value if the dictionary state has been loaded at address 1024).
The intention is that all compression algorithms will be able to use the (or part of the) string subset, and some compression methods, notably those that are related to the LZ78 family, will also use the table in order to form an initial set of tokens for that compression method. The text below therefore gives examples for referencing both the table subset and the string subset of the dictionary state item.
As defined in Section 3.3.3 in the Signaling Compression specification [2], a SigComp state is characterized by a certain set of information. For the presence-specific static dictionary, the information in the following table, Table 2, fully characterizes the state item.
Note that the string subset of the dictionary can be accessed using:
STATE-ACCESS (%ps, 6, 0, 0x0955, %sa, 0),
and the table subset can be accessed using:
STATE-ACCESS (%ps, 6, 0x0955, 0x043E, %sa, 0),
where %ps points to Universal Decompressor Virtual Machine (UDVM) memory containing
0xd942297d0bb3
and %sa is the desired destination address in UDVM memory with UDVM byte copying rules applied.
Garcia-Martin Standards Track [Page 6]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
If only a subset of the dictionary up to a specific priority is desired (e.g., to save UDVM space), the values for the third and forth operand in these STATE-ACCESS instructions can be changed to:
This document defines a presence-specific static dictionary for the Sigcomp framework [2]. Therefore, the security considerations of RFC 3320 [2] apply. This memo does not introduce any known additional security risk.
The author would like to thank Miraj Mostafa, Pekka Pessi, and Catalin Ionescu for their persistent convincing arguments to demonstrate the benefit of this dictionary. Thanks to Carsten Bormann and Adam Roach for providing assistance with the software that automatically generates the binary dictionary. Adam Roach, Cristian Constantin, and Avshalom Houri, and Krisztian Kiss reviewed the document and provided helpful comments.
Garcia-Martin Standards Track [Page 12]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
Appendix A. Input Strings to the Presence-Specific Static Dictionary
RFC 5112 Presence Dictionary for SIGCOMP January 2008
References
Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[2] Price, R., Bormann, C., Christoffersson, J., Hannu, H., Liu, Z., and J. Rosenberg, "Signaling Compression (SigComp)", RFC 3320, January 2003.
Informative References
[3] Deutsch, P., "DEFLATE Compressed Data Format Specification version 1.3", RFC 1951, May 1996.
[4] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002.
[5] Roach, A., "Session Initiation Protocol (SIP)-Specific Event Notification", RFC 3265, June 2002.
[6] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003.
[7] Garcia-Martin, M., Bormann, C., Ott, J., Price, R., and A. Roach, "The Session Initiation Protocol (SIP) and Session Description Protocol (SDP) Static Dictionary for Signaling Compression (SigComp)", RFC 3485, February 2003.
[8] Sugano, H., Fujimoto, S., Klyne, G., Bateman, A., Carr, W., and J. Peterson, "Presence Information Data Format (PIDF)", RFC 3863, August 2004.
[9] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.
[10] Rosenberg, J., "A Data Model for Presence", RFC 4479, July 2006.
[11] Schulzrinne, H., Gurbani, V., Kyzivat, P., and J. Rosenberg, "RPID: Rich Presence Extensions to the Presence Information Data Format (PIDF)", RFC 4480, July 2006.
[12] Schulzrinne, H., "CIPID: Contact Information for the Presence Information Data Format", RFC 4482, July 2006.
Garcia-Martin Standards Track [Page 22]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
[13] Schulzrinne, H., "Timed Presence Extensions to the Presence Information Data Format (PIDF) to Indicate Status Information for Past and Future Time Intervals", RFC 4481, July 2006.
[14] Urpalainen, J., "An Extensible Markup Language (XML) Patch Operations Framework Utilizing XML Path Language (XPath) Selectors", Work in Progress, March 2006.
[15] Lonnfors, M.,Leppanen, E., Khartabil, H., and J. Urpalainen, "Presence Information Data format (PIDF) Extension for Partial Presence", Work in Progress, November 2006.
[16] Peterson, J., "A Presence-based GEOPRIV Location Object Format", RFC 4119, December 2005.
[17] Rosenberg, J., "An Extensible Markup Language (XML) Based Format for Watcher Information", RFC 3858, August 2004.
[18] Khartabil, H., Leppanen, E., Lonnfors, M., and J. Costa- Requena, "An Extensible Markup Language (XML)-Based Format for Event Notification Filtering", RFC 4661, September 2006.
[19] Roach, A., Campbell, B., and J. Rosenberg, "A Session Initiation Protocol (SIP) Event Notification Extension for Resource Lists", RFC 4662, August 2006.
[20] Lonnfors, M. and K. Kiss, "Session Initiation Protocol (SIP) User Agent Capability Extension to Presence Information Data Format (PIDF)", Work in Progress, July 2006.
[21] Open Mobile Alliance, OMA., "OMA Presence Simple V1.0.1, Presence Information Data Format PIDF Schema Description", November 2006.
[22] Paoli, J., Maler, E., Yergeau, F., Sperberg-McQueen, C., and T. Bray, "Extensible Markup Language (XML) 1.0 (Fourth Edition)", World Wide Web Consortium Recommendation REC-xml-20060816, August 2006, <http://www.w3.org/TR/2006/REC-xml-20060816>.
[23] Fallside, D. and P. Walmsley, "XML Schema Part 0: Primer Second Edition", World Wide Web Consortium Recommendation REC- xmlschema-0-20041028, October 2004, <http://www.w3.org/TR/2004/REC-xmlschema-0-20041028>.
Garcia-Martin Standards Track [Page 23]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
Author's Address
Miguel A. Garcia-Martin Nokia Siemens Networks P.O.Box 6 Nokia Siemens Networks, FIN 02022 Finland
EMail: miguel.garcia@nsn.com
Garcia-Martin Standards Track [Page 24]
RFC 5112 Presence Dictionary for SIGCOMP January 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.