Internet Engineering Task Force (IETF) S. Donovan Request for Comments: 8581 Oracle Updates: 7683 August 2019 Category: Standards Track ISSN: 2070-1721
Diameter Agent Overload and the Peer Overload Report
Abstract
This specification documents an extension to the Diameter Overload Indication Conveyance (DOIC), a base solution for Diameter overload defined in RFC 7683. The extension defines the Peer Overload report type. The initial use case for the peer report is the handling of occurrences of overload of a Diameter Agent.
Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8581.
Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Donovan Standards Track [Page 1]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
This specification documents an extension to the Diameter Overload Indication Conveyance (DOIC), a base solution for Diameter overload [RFC7683]. The extension defines the Peer Overload report type. The initial use case for the peer report is the handling of occurrences of overload of a Diameter Agent.
This document defines the behavior of Diameter nodes when Diameter Agents enter an overload condition and send an Overload report requesting a reduction of traffic. It also defines a new Overload report type, the Peer Overload report type, which is used for handling agent overload conditions. The Peer Overload report type is defined in a generic fashion so that it can also be used for other Diameter overload scenarios.
The base Diameter overload specification [RFC7683] addresses the handling of overload when a Diameter endpoint (a Diameter Client or Diameter Server as defined in [RFC6733]) becomes overloaded.
In the base specification, the goal is to handle abatement of the overload occurrence as close to the source of the Diameter traffic as feasible. When possible, this is done at the originator of the traffic, generally referred to as a Diameter Client. A Diameter Agent might also handle the overload mitigation. For instance, a Diameter Agent might handle Diameter overload mitigation when it knows that a Diameter Client does not support the DOIC extension.
This document extends the base Diameter endpoint overload specification to address the case when Diameter Agents become overloaded. Just as is the case with other Diameter nodes, i.e., Diameter Clients and Diameter Servers, surges in Diameter traffic can cause a Diameter Agent to be asked to handle more Diameter traffic than it was configured to handle. For a more detailed discussion of what can cause the overload of Diameter nodes, refer to the Diameter overload requirements [RFC7068].
This document defines a new Overload report type to communicate occurrences of agent overload. This report type works for the Diameter overload loss abatement algorithm defined in [RFC7683] and is expected to work for other overload abatement algorithms defined in extensions to the DOIC solution.
Donovan Standards Track [Page 3]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This section outlines representative use cases for the peer report used to communicate agent overload.
There are two primary classes of use cases currently identified: those involving the overload of agents, and those involving the overload of Diameter endpoints. In both cases, the goal is to use an overload algorithm that controls traffic sent towards peers.
The peer report needs to support the use cases described below.
In the figures in this section, elements labeled "c" are Diameter Clients, elements labeled "a" are Diameter Agents, and elements labeled "s" are Diameter Servers.
This use case is illustrated in Figure 1. In this case, the client sends all traffic through the single agent. If there is a failure in the agent, then the client is unable to send Diameter traffic toward the server.
+-+ +-+ +-+ |c|----|a|----|s| +-+ +-+ +-+
Figure 1
A more likely case for the use of agents is illustrated in Figure 2. In this case, there are multiple servers behind the single agent. The client sends all traffic through the agent, and the agent determines how to distribute the traffic to the servers based on local routing and load distribution policy.
RFC 8581 Diameter Agent Overload and Peer Report August 2019
In both of these cases, the occurrence of overload in the single agent must by handled by the client similarly to as if the client were handling the overload of a directly connected server. When the agent becomes overloaded, it will insert an Overload report in answer messages flowing to the client. This Overload report will contain a requested reduction in the amount of traffic sent to the agent. The client will apply overload abatement behavior as defined in the base Diameter overload specification [RFC7683] or in the extension document that defines the indicated overload abatement algorithm. This will result in the throttling of the abated traffic that would have been sent to the agent, as there is no alternative route. The client sends an appropriate error response to the originator of the request.
Figure 3 and Figure 4 illustrate a second, and more likely, type of deployment scenario involving agents. In both of these cases, the client has Diameter connections to two agents.
Figure 3 illustrates a client that has a primary connection to one of the agents (agent a1) and a secondary connection to the other agent (agent a2). In this scenario, under normal circumstances, the client will use the primary connection for all traffic. The secondary connection is used when there is a failure scenario of some sort.
RFC 8581 Diameter Agent Overload and Peer Report August 2019
The second case, in Figure 4, illustrates the case where the connections to the agents are both actively used. In this case, the client will have local distribution policy to determine the traffic sent through each client.
In the case where one of the agents in the above scenarios become overloaded, the client should reduce the amount of traffic sent to the overloaded agent by the amount requested. This traffic should instead be routed through the non-overloaded agent. For example, assume that the overloaded agent requests a reduction of 10 percent. The client should send 10 percent of the traffic that would have been routed to the overloaded agent through the non-overloaded agent.
When the client has both an active and a standby connection to the two agents, then an alternative strategy for responding to an Overload report from an agent is to change the standby connection to active. This will result in all traffic being routed through the new active connection.
In the case where both agents are reporting overload, the client may need to start decreasing the total traffic sent to the agents. This would be done in a similar fashion as that discussed in Section 4.1.1. The amount of traffic depends on the combined reduction requested by the two agents.
There are also deployment scenarios where there can be multiple Diameter Agents between Diameter Clients and Diameter Servers. An example of this type of deployment is when there are Diameter Agents between administrative domains.
Donovan Standards Track [Page 7]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
Figure 5 illustrates one such network deployment case. Note that while this figure shows a maximum of two agents being involved in a Diameter transaction, it is possible for more than two agents to be in the path of a transaction.
The handling of overload for one or both agents, a11 or a12 in this case, is equivalent to that discussed in Section 4.1.2.
The overload of agents a21 and a22 must be handled by the previous- hop agents. As such, agents a11 and a12 must handle the overload mitigation logic when receiving an Agent Overload report from agents a21 and a22.
The handling of Peer Overload reports is similar to that discussed in Section 4.1.2. If the overload can be addressed using diversion, then this approach should be taken.
If both of the agents have requested a reduction in traffic, then the previous-hop agent must start throttling the appropriate number of transactions. When throttling requests, an agent uses the same error responses as defined in the base DOIC specification [RFC7683].
It is envisioned that abatement algorithms will be defined that will support the option for Diameter endpoints to send peer reports. For instance, it is envisioned that one usage scenario for the rate algorithm [RFC8582] will involve abatement being done on a hop-by-hop basis.
This rate-deployment scenario would involve Diameter endpoints generating peer reports and selecting the rate algorithm for abatement of overload conditions.
Donovan Standards Track [Page 8]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
5. Interaction Between Host/Realm and Peer Overload Reports
It is possible for both an agent and an endpoint in the path of a transaction to be overloaded at the same time. When this occurs, Diameter entities need to handle multiple Overload reports. In this scenario, the reacting node should first handle the throttling of the overloaded Host or Realm. Any messages that survive throttling due to Host or Realm reports should then go through abatement for the Peer Overload report. In this scenario, when doing abatement on the peer report, the reacting node SHOULD take into consideration the number of messages already throttled by the handling of the host/ realm report abatement.
Note: The goal is to avoid traffic oscillations that might result from throttling of messages for both the host/realm Overload reports and the PEER Overload reports. This is especially a concern if both reports indicate the loss abatement algorithm.
When sending a Diameter request, a DOIC node that supports the OC_PEER_REPORT feature (as defined in Section 7.1.1) MUST include in the OC-Supported-Features AVP an OC-Feature-Vector AVP with the OC_PEER_REPORT bit set.
When sending a request, a DOIC node that supports the OC_PEER_REPORT feature MUST include a SourceID AVP in the OC-Supported-Features AVP with its own DiameterIdentity.
When a Diameter Agent relays a request that includes a SourceID AVP in the OC-Supported-Features AVP, if the Diameter Agent supports the OC_PEER_REPORT feature, then it MUST remove the received SourceID AVP and replace it with a SourceID AVP containing its own DiameterIdentity.
When receiving a request, a DOIC node that supports the OC_PEER_REPORT feature MUST update transaction state with an indication of whether or not the peer from which the request was received supports the OC_PEER_REPORT feature.
Donovan Standards Track [Page 9]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
Note: The transaction state is used when the DOIC node is acting as a peer-report reporting node and needs to send OC-OLR AVP reports of type "PEER-REPORT" in answer messages. The Peer Overload reports are only included in answer messages being sent to peers that support the OC_PEER_REPORT feature.
The peer supports the OC_PEER_REPORT feature if the received request contains an OC-Supported-Features AVP with the OC-Feature-Vector with the OC_PEER_REPORT feature bit set and with a SourceID AVP with a value that matches the DiameterIdentity of the peer from which the request was received.
When an agent relays an answer message, a reporting node that supports the OC_PEER_REPORT feature MUST strip any SourceID AVP from the OC-Supported-Features AVP.
When sending an answer message, a reporting node that supports the OC_PEER_REPORT feature MUST determine if the peer to which the answer is to be sent supports the OC_PEER_REPORT feature.
If the peer supports the OC_PEER_REPORT feature, then the reporting node MUST indicate support for the feature in the OC-Supported- Features AVP.
If the peer supports the OC_PEER_REPORT feature, then the reporting node MUST insert the SourceID AVP in the OC-Supported-Features AVP in the answer message.
If the peer supports the OC_PEER_REPORT feature, then the reporting node MUST insert the OC-Peer-Algo AVP in the OC-Supported-Features AVP. The OC-Peer-Algo AVP MUST indicate the overload abatement algorithm that the reporting node wants the reacting nodes to use should the reporting node send a Peer Overload report as a result of becoming overloaded.
This section describes the Overload Control State (OCS) that might be maintained by both the peer-report reporting node and the peer-report reacting node.
This is an extension of the OCS handling defined in [RFC7683].
Donovan Standards Track [Page 10]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
A DOIC node that supports the OC_PEER_REPORT feature SHOULD maintain Reporting-Node OCS, as defined in [RFC7683] and extended here.
If different abatement-specific contents are sent to each peer, then the reporting node MUST maintain a separate reporting-node peer- report OCS entry per peer, to which a Peer Overload report is sent.
Note: The rate-overload abatement algorithm allows for different rates to be sent to each peer.
In addition to OCS maintained as defined in [RFC7683], a reacting node that supports the OC_PEER_REPORT feature maintains the following OCS per supported Diameter application:
A peer-report OCS entry for each peer to which it sends requests
A peer-report OCS entry is identified by both the Application-ID and the peer's DiameterIdentity.
The peer-report OCS entry includes the following information (the actual information stored is an implementation decision):
Sequence number (as received in the OC-OLR AVP)
Time of expiry (derived from the OC-Validity-Duration AVP received in the OC-OLR AVP and time of reception of the message carrying the OC-OLR AVP)
Selected abatement algorithm (as received in the OC-Supported- Features AVP)
Input data that is specific to the abatement algorithm (as received in the OC-OLR AVP, e.g., OC-Reduction-Percentage for the loss abatement algorithm)
6.2.2. Reporting-Node Maintenance of Peer-Report OCS
All rules for managing the reporting-node OCS entries defined in [RFC7683] apply to the peer report.
Donovan Standards Track [Page 11]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
6.2.3. Reacting-Node Maintenance of Peer-Report OCS
When a reacting node receives an OC-OLR AVP with a report type of "PEER-REPORT", it MUST determine if the report was generated by the Diameter peer from which the report was received.
If a reacting node receives an OC-OLR AVP of type "PEER-REPORT" and the SourceID matches the DiameterIdentity of the Diameter peer from which the response message was received, then the report was generated by a Diameter peer.
If a reacting node receives an OC-OLR AVP of type "PEER-REPORT" and the SourceID does not match the DiameterIdentity of the Diameter peer from which the response message was received, then the reacting node MUST ignore the Overload report.
Note: Under normal circumstances, a Diameter node will not add a peer report when sending to a peer that does not support this extension. This requirement is to handle the case where peer reports are erroneously or maliciously inserted into response messages.
If the peer report was received from a Diameter peer, then the reacting node MUST determine if it is for an existing or new overload condition.
The peer report is for an existing overload condition if the reacting node has an OCS that matches the received peer report. For a peer report, this means it matches the Application-ID and the peer's DiameterIdentity in an existing OCS entry.
If the peer report is for an existing overload condition, then it MUST determine if the peer report is a retransmission or an update to the existing OLR.
If the sequence number for the received peer report is greater than the sequence number stored in the matching OCS entry, then the reacting node MUST update the matching OCS entry.
If the sequence number for the received peer report is less than or equal to the sequence number in the matching OCS entry, then the reacting node MUST silently ignore the received peer report. The matching OCS MUST NOT be updated in this case.
If the received peer report is for a new overload condition, then the reacting node MUST generate a new OCS entry for the overload condition.
Donovan Standards Track [Page 12]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
For a peer report, this means it creates an OCS entry with a DiameterIdentity from the SourceID AVP in the received OC-OLR AVP.
If the received peer report contains a validity duration of zero ("0"), then the reacting node MUST update the OCS entry as being expired.
The reacting node does not delete an OCS when receiving an answer message that does not contain an OC-OLR AVP (i.e., the absence of OLR means "no change").
The reacting node sets the abatement algorithm based on the OC-Peer- Algo AVP in the received OC-Supported-Features AVP.
When there is an existing reporting-node peer-report OCS entry, the reporting node MUST include an OC-OLR AVP with a report type of "PEER-REPORT" using the contents of the reporting-node peer-report OCS entry in all answer messages sent by the reporting node to peers that support the OC_PEER_REPORT feature.
Note: The reporting node determines if a peer supports the OC_PEER_REPORT feature based on the indication recorded in the reporting node's transaction state.
The reporting node MUST include its DiameterIdentity in the SourceID AVP in the OC-OLR AVP. This is used by DOIC nodes that support the OC_PEER_REPORT feature to determine if the report was received from a Diameter peer.
The reporting agent must follow all other overload reporting-node behaviors outlined in the DOIC specification.
A reacting node supporting this extension MUST support the receipt of multiple Overload reports in a single message. The message might include a Host Overload report, a Realm Overload report, and/or a Peer Overload report.
When a reacting node sends a request, it MUST determine if that request matches an active OCS.
In all cases, if the reacting node is an agent, then it MUST strip the Peer-Report OC-OLR AVP from the message.
Donovan Standards Track [Page 13]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
If the request matches an active OCS, then the reacting node MUST apply abatement treatment to the request. The abatement treatment applied depends on the abatement algorithm indicated in the OCS.
For Peer Overload Reports, the preferred abatement treatment is diversion. As such, the reacting node SHOULD attempt to divert requests identified as needing abatement to other peers.
If there is not sufficient capacity to divert abated traffic, then the reacting node MUST throttle the necessary requests to fit within the available capacity of the peers able to handle the requests.
If the abatement treatment results in throttling of the request and if the reacting node is an agent, then the agent MUST send an appropriate error response as defined in [RFC7683].
In the case that the OCS entry validity duration expires or has a validity duration of zero ("0"), meaning that if the reporting node has explicitly signaled the end of the overload condition, then abatement associated with the OCS entry MUST be ended in a controlled fashion.
This extension adds a new feature to the OC-Feature-Vector AVP. This feature indication shows support for handling of Peer Overload reports. Peer Overload reports are used by agents to indicate the need for overload abatement handling by the agent's peer.
A supporting node must also include the SourceID AVP in the OC-Supported-Features capability AVP.
This AVP contains the DiameterIdentity of the node that supports the OC_PEER_REPORT feature. This AVP is used to determine if support for the Peer Overload report is in an adjacent node. The value of this AVP should be the same Diameter identity used as part of the Diameter Capabilities Exchange procedure defined in [RFC7683].
This extension also adds the OC-Peer-Algo AVP to the OC-Supported- Features AVP. This AVP is used by a reporting node to indicate the abatement algorithm it will use for Peer Overload reports.
Donovan Standards Track [Page 14]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
The OC-Peer-Algo AVP (AVP code 648) is of type Unsigned64 and contains a 64-bit flags field of announced capabilities for a DOIC node. The value of zero ("0") is reserved.
Feature bits defined for the OC-Feature-Vector AVP and associated with overload abatement algorithms are reused for this AVP.
This extension makes no changes to the OC_Sequence_Number or OC_Validity_Duration AVPs in the OC-OLR AVP. These AVPs can also be used in Peer Overload reports.
The OC_PEER_REPORT feature extends the base Diameter overload specification by defining a new Overload report type of "PEER- REPORT". See Section 7.6 of [RFC7683] for a description of the OC-Report-Type AVP.
The peer report MUST also include the Diameter identity of the agent that generated the report. This is necessary to handle the case where there is a non-supporting agent between the reporting node and the reacting node. Without the indication of the agent that generated the peer report, the reacting node could erroneously assume that the report applied to the non-supporting node. This could, in turn, result in unnecessary traffic being either diverted or throttled.
The SourceID AVP is used in the OC-OLR AVP to carry this DiameterIdentity.
Donovan Standards Track [Page 15]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
The following new report type is defined for the OC-Report-Type AVP.
PEER_REPORT 2: The overload treatment should apply to all requests bound for the peer identified in the Overload report. If the peer identified in the peer report is not a peer to the reacting endpoint, then the peer report should be stripped and not acted upon.
The SourceID AVP (AVP code 649) is of type DiameterIdentity and is inserted by a Diameter node to indicate the source of the AVP in which it is a part.
In the case of peer reports, the SourceID AVP indicates the node that supports this feature (in the OC-Supported-Features AVP) or the node that generates an overload report with a report type of "PEER-REPORT" (in the OC-OLR AVP).
It contains the DiameterIdentity of the inserting node. This is used by other Diameter nodes to determine the node that inserted the enclosing AVP that contains the SourceID AVP.
Note that the values used for the OC-Peer-Algo AVP are a subset of the "OC-Feature-Vector AVP Values (code 622)" registry. Only the values in that registry that apply to overload abatement algorithms apply to the OC-Peer-Algo AVP.
A new OC-Feature-Vector AVP value is defined in Section 7.1.1.
A new OC-Report-Type AVP value is defined in Section 7.2.1.
Agent overload is an extension to the base Diameter Overload mechanism. As such, all of the security considerations outlined in [RFC7683] apply to the agent overload scenarios.
It is possible that the malicious insertion of an peer report could have a bigger impact on a Diameter network as agents can be concentration points in a Diameter network. Where an endpoint report would impact the traffic sent to a single Diameter Server, for example, a peer report could throttle all traffic to the Diameter network.
This impact is amplified in a Diameter agent that sits at the edge of a Diameter network that serves as the entry point from all other Diameter networks.
The impacts of this attack, as well as the mitigation strategies, are the same as those outlined in [RFC7683].
Donovan Standards Track [Page 17]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
The author would like to thank Adam Roach and Eric McMurry for the work done in defining a comprehensive Diameter overload solution in draft-roach-dime-overload-ctrl-03.txt.
The author would also like to thank Ben Campbell for his insights and review of early versions of this document.
Donovan Standards Track [Page 18]
RFC 8581 Diameter Agent Overload and Peer Report August 2019
Author's Address
Steve Donovan Oracle 7460 Warren Parkway, Suite 300 Frisco, Texas 75034 United States of America