Internet Engineering Task Force (IETF) C. Davids Request for Comments: 7501 Illinois Institute of Technology Category: Informational V. Gurbani ISSN: 2070-1721 Bell Laboratories, Alcatel-Lucent S. Poretsky Allot Communications April 2015
Terminology for Benchmarking Session Initiation Protocol (SIP) Devices: Basic Session Setup and Registration
This document provides a terminology for benchmarking the Session Initiation Protocol (SIP) performance of devices. Methodology related to benchmarking SIP devices is described in the companion methodology document (RFC 7502). Using these two documents, benchmarks can be obtained and compared for different types of devices such as SIP Proxy Servers, Registrars, and Session Border Controllers. The term "performance" in this context means the capacity of the Device Under Test (DUT) to process SIP messages. Media streams are used only to study how they impact the signaling behavior. The intent of the two documents is to provide a normalized set of tests that will enable an objective comparison of the capacity of SIP devices. Test setup parameters and a methodology are necessary because SIP allows a wide range of configurations and operational conditions that can influence performance benchmark measurements. A standard terminology and methodology will ensure that benchmarks have consistent definitions and were obtained following the same procedures.
Status of This Memo
This document is not an Internet Standards Track specification; it is published for informational purposes.
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.
Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Service Providers and IT organizations deliver Voice Over IP (VoIP) and multimedia network services based on the IETF Session Initiation Protocol (SIP) [RFC3261]. SIP is a signaling protocol originally intended to be used to dynamically establish, disconnect, and modify streams of media between end users. As it has evolved, it has been adopted for use in a growing number of services and applications. Many of these result in the creation of a media session, but some do not. Examples of this latter group include text messaging and subscription services. The set of benchmarking terms provided in this document is intended for use with any SIP-enabled device
Davids, et al. Informational [Page 3]
RFC 7501 SIP Benchmarking Terminology April 2015
performing SIP functions in the interior of the network, whether or not these result in the creation of media sessions. The performance of end-user devices is outside the scope of this document.
A number of networking devices have been developed to support SIP- based VoIP services. These include SIP servers, Session Border Controllers (SBCs), and Back-to-back User Agents (B2BUAs). These devices contain a mix of voice and IP functions whose performance may be reported using metrics defined by the equipment manufacturer or vendor. The Service Provider or IT organization seeking to compare the performance of such devices will not be able to do so using these vendor-specific metrics, whose conditions of test and algorithms for collection are often unspecified.
SIP functional elements and the devices that include them can be configured many different ways and can be organized into various topologies. These configuration and topological choices impact the value of any chosen signaling benchmark. Unless these conditions of test are defined, a true comparison of performance metrics across multiple vendor implementations will not be possible.
Some SIP-enabled devices terminate or relay media as well as signaling. The processing of media by the device impacts the signaling performance. As a result, the conditions of test must include information as to whether or not the Device Under Test processes media. If the device processes media during the test, a description of the media must be provided. This document and its companion methodology document [RFC7502] provide a set of black-box benchmarks for describing and comparing the performance of devices that incorporate the SIP User Agent Client and Server functions and that operate in the network's core.
The definition of SIP performance benchmarks necessarily includes definitions of Test Setup Parameters and a test methodology. These enable the Tester to perform benchmarking tests on different devices and to achieve comparable results. This document provides a common set of definitions for Test Components, Test Setup Parameters, and Benchmarks. All the benchmarks defined are black-box measurements of the SIP signaling plane. The Test Setup Parameters and Benchmarks defined in this document are intended for use with the companion methodology document.
The scope of this document is summarized as follows:
o This terminology document describes SIP signaling performance benchmarks for black-box measurements of SIP networking devices. Stress conditions and debugging scenarios are not addressed in this document.
o The DUT must be network equipment that is RFC 3261 capable. This may be a Registrar, Redirect Server, or Stateful Proxy. This document does not require the intermediary to assume the role of a stateless proxy. A DUT may also act as a B2BUA or take the role of an SBC.
o The Tester acts as multiple Emulated Agents (EAs) that initiate (or respond to) SIP messages as session endpoints and source (or receive) associated media for established connections.
o Regarding SIP signaling in presence of media:
* The media performance is not benchmarked.
* Some tests require media, but the use of media is limited to observing the performance of SIP signaling. Tests that require media will annotate the media characteristics as a condition of test.
* The type of DUT dictates whether the associated media streams traverse the DUT. Both scenarios are within the scope of this document.
* SIP is frequently used to create media streams; the signaling plane and media plane are treated as orthogonal to each other in this document. While many devices support the creation of media streams, benchmarks that measure the performance of these streams are outside the scope of this document and its companion methodology document [RFC7502]. Tests may be performed with or without the creation of media streams. The presence or absence of media streams MUST be noted as a condition of the test, as the performance of SIP devices may vary accordingly. Even if the media is used during benchmarking, only the SIP performance will be benchmarked, not the media performance or quality.
o Both INVITE and non-INVITE scenarios (registrations) are addressed in this document. However, benchmarking SIP presence or subscribe-notify extensions is not a part of this document.
Davids, et al. Informational [Page 5]
RFC 7501 SIP Benchmarking Terminology April 2015
o Different transport -- such as UDP, TCP, SCTP, or TLS -- may be used. The specific transport mechanism MUST be noted as a condition of the test, as the performance of SIP devices may vary accordingly.
o REGISTER and INVITE requests may be challenged or remain unchallenged for authentication purposes. Whether or not the REGISTER and INVITE requests are challenged is a condition of test that will be recorded along with other such parameters that may impact the SIP performance of the device or system under test.
o Re-INVITE requests are not considered within the scope of this document since the benchmarks for INVITEs are based on the dialog created by the INVITE and not on the transactions that take place within that dialog.
o Only session establishment is considered for the performance benchmarks. Session disconnect is not considered within the scope of this document. This is because our goal is to determine the maximum capacity of the device or system under test, that is, the number of simultaneous SIP sessions that the device or system can support. It is true that there are BYE requests being created during the test process. These transactions do contribute to the load on the device or system under test and thus are accounted for in the metric we derive. We do not seek a separate metric for the number of BYE transactions a device or system can support.
o Scenarios that are specific to the IP Multimedia Subsystem (IMS) are not considered, but test cases can be applied with 3GPP- specific SIP signaling and the Proxy-Call Session Control Function (P-CSCF) as a DUT.
o The benchmarks described in this document are intended for a laboratory environment and are not intended to be used on a production network. Some of the benchmarks send enough traffic that a denial-of-service attack is possible if used in production networks.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC2119 [RFC2119]. RFC 2119 defines the use of these key words to help make the intent of Standards Track documents as clear as possible. While this document uses these keywords, this document is not a Standards Track document.
Many commonly used SIP terms in this document are defined in RFC 3261 [RFC3261]. For convenience, the most important of these are reproduced below. Use of these terms in this document is consistent with their corresponding definition in the base SIP specification [RFC3261] as amended by [RFC4320], [RFC5393], and [RFC6026].
o Call Stateful: A proxy is call stateful if it retains state for a dialog from the initiating INVITE to the terminating BYE request. A call stateful proxy is always transaction stateful, but the converse is not necessarily true.
o Stateful Proxy: A logical entity, as defined by [RFC3261], that maintains the client and server transaction state machines during the processing of a request. (Also known as a transaction stateful proxy.) The behavior of a stateful proxy is further defined in Section 16 of RFC 3261 [RFC3261] . A transaction stateful proxy is not the same as a call stateful proxy.
o Back-to-Back User Agent: A back-to-back user agent (B2BUA) is a logical entity that receives a request and processes it as a user agent server (UAS). In order to determine how the request should be answered, it acts as a user agent client (UAC) and generates requests. Unlike a proxy server, it maintains dialog state and must participate in all requests sent on the dialogs it has established. Since it is a concatenation of a UAC and a UAS, no explicit definitions are needed for its behavior.
Definition: The plane in which SIP messages [RFC3261] are exchanged between SIP agents [RFC3261].
Discussion: SIP messages are used to establish sessions in several ways: directly between two User Agents [RFC3261], through a Proxy Server [RFC3261], or through a series of Proxy Servers. The Session Description Protocol (SDP) is included in the Signaling Plane.
Definition: The data plane in which one or more media streams and their associated media control protocols (e.g., RTCP [RFC3550]) are exchanged between User Agents after a media connection has been created by the exchange of signaling messages in the Signaling Plane.
Discussion: Media may also be known as the "bearer channel". The Media Plane MUST include the media control protocol, if one is used, and the media stream(s). Examples of media are audio and video. The media streams are described in the SDP of the Signaling Plane.
Definition: Overload is defined as the state where a SIP server does not have sufficient resources to process all incoming SIP messages [RFC6357].
Discussion: The distinction between an overload condition and other failure scenarios is outside the scope of black-box testing and of this document. Under overload conditions, all or a percentage of Session Attempts will fail due to lack of resources. In black-box testing, the cause of the failure is not explored. The fact that a failure occurred for whatever reason will trigger the tester to reduce the offered load, as described in the companion methodology document [RFC7502]. SIP server resources may include CPU processing capacity, network bandwidth, input/output queues, or disk resources. Any combination of resources may be fully utilized when a SIP server (the DUT) is in the overload condition. For proxy-only (or intermediary) devices, it is expected that the proxy will be driven into overload based on the delivery rate of signaling requests.
Definition: A SIP INVITE or REGISTER request sent by the EA that has not received a final response.
Discussion: The attempted session may be either an invitation to an audio/ video communication or a registration attempt. When counting the number of session attempts, we include all requests that are rejected for lack of authentication information. The EA needs to record the total number of session attempts including those attempts that are routinely rejected by a proxy that requires the UA to authenticate itself. The EA is provisioned to deliver a specific number of session attempts per second. But the EA must also count the actual number of session attempts per given time interval.
Definition: A session attempt that does not result in an Established Session.
Discussion: The session attempt failure may be indicated by the following observations at the EA:
1. Receipt of a SIP 3xx-, 4xx-, 5xx-, or 6xx-class response to a Session Attempt. 2. The lack of any received SIP response to a Session Attempt within the Establishment Threshold Time (cf. Section 3.3.2).
Definition: The protocol used for transport of the Signaling Plane messages.
Discussion: Performance benchmarks may vary for the same SIP networking device depending upon whether TCP, UDP, TLS, SCTP, websockets [RFC7118], or any future transport-layer protocol is used. For this reason, it is necessary to measure the SIP Performance Benchmarks using these various transport protocols. Performance Benchmarks MUST report the SIP Transport Protocol used to obtain the benchmark results.
Measurement Units: While these are not units of measure, they are attributes that are one of many factors that will contribute to the value of the measurements to be taken. TCP, UDP, SCTP, TLS over TCP, TLS over UDP, TLS over SCTP, and websockets are among the possible values to be recorded as part of the test.
Definition: Configuration on the EA for a fixed number of frames or samples to be sent in each RTP packet of the media stream when the test involves Associated Media.
Discussion: This document describes a method to measure SIP performance. If the DUT is processing media as well as SIP messages the media processing will potentially slow down the SIP processing and lower the SIP performance metric. The tests with associated media are designed for audio codecs, and the assumption was made that larger media packets would require more processor time. This document does not define parameters applicable to video codecs.
For a single benchmark test, media sessions use a defined number of samples or frames per RTP packet. If two SBCs, for example, used the same codec but one puts more frames into the RTP packet, this might cause variation in the performance benchmark results.
Measurement Units: An integer number of frames or samples, depending on whether a hybrid- or sample-based codec is used, respectively.
Definition: The maximum value of the Session Attempt Rate that the DUT can handle for an extended, predefined period with zero failures.
Discussion: This benchmark is obtained with zero failure. The Session Attempt Rate provisioned on the EA is raised and lowered as described in the algorithm in the accompanying methodology document [RFC7502], until a traffic load over the period of time necessary to attempt N sessions completes without failure, where N is a parameter specified in the algorithm and recorded in the Test Setup Report.
Definition: The maximum value of the Registration Attempt Rate that the DUT can handle for an extended, predefined period with zero failures.
Discussion: This benchmark is obtained with zero failures. The registration rate provisioned on the Emulated Agent is raised and lowered as described in the algorithm in the companion methodology document [RFC7502], until a traffic load consisting of registration attempts at the given attempt rate over the period of time necessary to attempt N registrations completes without failure, where N is a parameter specified in the algorithm and recorded in the Test Setup Report.
This benchmark is described separately from the Session Establishment Rate (Section 3.4.1), although it could be considered a special case of that benchmark, since a REGISTER request is a request for a session that is not initiated by an INVITE request. It is defined separately because it is a very important benchmark for most SIP installations. An example demonstrating its use is an avalanche restart, where hundreds of thousands of endpoints register simultaneously following a power outage. In such a case, an authoritative measurement of the capacity of the device to register endpoints is useful to the network designer. Additionally, in certain controlled networks, there appears to be a difference between the registration rate of new endpoints and the registering rate of existing endpoints (register refreshes). This benchmark can capture these differences as well.
Documents of this type do not directly affect the security of the Internet or corporate networks as long as benchmarking is not performed on devices or systems connected to production networks. Security threats and how to counter these in SIP and the media layer are discussed in RFC 3261 [RFC3261], RFC 3550 [RFC3550], and RFC 3711 [RFC3711]. This document attempts to formalize a set of common terminology for benchmarking SIP networks. Packets with unintended and/or unauthorized DSCP or IP precedence values may present security issues. Determining the security consequences of such packets is out of scope for this document.
The authors would like to thank Keith Drage, Cullen Jennings, Daryl Malas, Al Morton, and Henning Schulzrinne for invaluable contributions to this document. Dale Worley provided an extensive review that lead to improvements in the documents. We are grateful to Barry Constantine, William Cerveny, and Robert Sparks for providing valuable comments during the documents' last calls and expert reviews. Al Morton and Sarah Banks have been exemplary working group chairs; we thank them for tracking this work to completion.
Davids, et al. Informational [Page 19]
RFC 7501 SIP Benchmarking Terminology April 2015
Carol Davids Illinois Institute of Technology 201 East Loop Road Wheaton, IL 60187 United States
Phone: +1 630 682 6024 EMail: email@example.com
Vijay K. Gurbani Bell Laboratories, Alcatel-Lucent 1960 Lucent Lane Rm 9C-533 Naperville, IL 60566 United States
Phone: +1 630 224 0216 EMail: firstname.lastname@example.org
Scott Poretsky Allot Communications 300 TradeCenter, Suite 4680 Woburn, MA 08101 United States