draft-ietf-tsvwg-aqm-dualq-coupled-25.original | draft-ietf-tsvwg-aqm-dualq-coupled-26v2.txt | |||
---|---|---|---|---|
Transport Area working group (tsvwg) K. De Schepper | Transport Area working group (tsvwg) K. De Schepper | |||
Internet-Draft Nokia Bell Labs | Internet-Draft Nokia Bell Labs | |||
Intended status: Experimental B. Briscoe, Ed. | Intended status: Experimental B. Briscoe, Ed. | |||
Expires: 2 March 2023 Independent | Expires: April 24, 2023 Independent | |||
G. White | G. White | |||
CableLabs | CableLabs | |||
29 August 2022 | October 21, 2022 | |||
DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | |||
(L4S) | (L4S) | |||
draft-ietf-tsvwg-aqm-dualq-coupled-25 | draft-ietf-tsvwg-aqm-dualq-coupled-26 | |||
Abstract | Abstract | |||
This specification defines a framework for coupling the Active Queue | This specification defines a framework for coupling the Active Queue | |||
Management (AQM) algorithms in two queues intended for flows with | Management (AQM) algorithms in two queues intended for flows with | |||
different responses to congestion. This provides a way for the | different responses to congestion. This provides a way for the | |||
Internet to transition from the scaling problems of standard TCP | Internet to transition from the scaling problems of standard TCP | |||
Reno-friendly ('Classic') congestion controls to the family of | Reno-friendly ('Classic') congestion controls to the family of | |||
'Scalable' congestion controls. These are designed for consistently | 'Scalable' congestion controls. These are designed for consistently | |||
very Low queuing Latency, very Low congestion Loss and Scaling of | very Low queuing Latency, very Low congestion Loss and Scaling of | |||
skipping to change at page 1, line 49 ¶ | skipping to change at page 1, line 49 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on 2 March 2023. | This Internet-Draft will expire on April 24, 2023. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Simplified BSD License text as described in Section 4.e of | |||
provided without warranty as described in the Revised BSD License. | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 | 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 | |||
1.2. Context, Scope & Applicability . . . . . . . . . . . . . 6 | 1.2. Context, Scope & Applicability . . . . . . . . . . . . . 6 | |||
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 | 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 | 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 | 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 | |||
2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 | 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 | 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 | 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 | |||
2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 | 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 | |||
2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 | 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 | |||
2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 | 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 | |||
2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 | 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 | |||
2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 | 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 | |||
2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 | 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 | |||
2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | |||
2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22 | 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 21 | |||
2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | |||
3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | |||
4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | |||
4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22 | 4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22 | |||
4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23 | 4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23 | |||
4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24 | 4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24 | |||
4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | |||
Throughput or Delay? . . . . . . . . . . . . . . . . 25 | Throughput or Delay? . . . . . . . . . . . . . . . . 24 | |||
4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26 | 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26 | |||
4.2.3.1. Protecting against Overload by Unresponsive | 4.2.3.1. Protecting against Overload by Unresponsive ECN- | |||
ECN-Capable Traffic . . . . . . . . . . . . . . . . 28 | Capable Traffic . . . . . . . . . . . . . . . . . 27 | |||
5. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 | 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
5.1. Normative References . . . . . . . . . . . . . . . . . . 28 | 5.1. Normative References . . . . . . . . . . . . . . . . . . 28 | |||
5.2. Informative References . . . . . . . . . . . . . . . . . 29 | 5.2. Informative References . . . . . . . . . . . . . . . . . 28 | |||
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 35 | ||||
Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 34 | ||||
A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 35 | A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 35 | |||
A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 46 | A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 45 | |||
Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 51 | Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 50 | |||
B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 51 | B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 50 | |||
B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 57 | B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 56 | |||
Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 59 | Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 58 | |||
C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 59 | C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 58 | |||
C.2. Guidance on Controlling Throughput Equivalence . . . . . 60 | C.2. Guidance on Controlling Throughput Equivalence . . . . . 59 | |||
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 64 | Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 63 | |||
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 64 | Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 63 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 65 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 64 | |||
1. Introduction | 1. Introduction | |||
This document specifies a framework for DualQ Coupled AQMs, which can | This document specifies a framework for DualQ Coupled AQMs, which can | |||
serve as the network part of the L4S | serve as the network part of the L4S | |||
architecture [I-D.ietf-tsvwg-l4s-arch]. A Coupled DualQ AQM consists | architecture [I-D.ietf-tsvwg-l4s-arch]. A Coupled DualQ AQM consists | |||
of two queues; L4S and Classic. The L4S queue is intended for | of two queues; L4S and Classic. The L4S queue is intended for | |||
Scalable congestion controls that can maintain very low queuing | Scalable congestion controls that can maintain very low queuing | |||
latency (sub-millisecond on average) and high throughput at the same | latency (sub-millisecond on average) and high throughput at the same | |||
time. The Coupled DualQ acts like a semi-permeable membrane: the L4S | time. The Coupled DualQ acts like a semi-permeable membrane: the L4S | |||
skipping to change at page 4, line 15 ¶ | skipping to change at page 4, line 16 ¶ | |||
now it has not been possible to allow any number of low latency, high | now it has not been possible to allow any number of low latency, high | |||
throughput applications to seek to fully utilize available capacity, | throughput applications to seek to fully utilize available capacity, | |||
because the capacity-seeking process itself causes too much queuing | because the capacity-seeking process itself causes too much queuing | |||
delay. | delay. | |||
To reduce this queuing delay caused by the capacity seeking process, | To reduce this queuing delay caused by the capacity seeking process, | |||
changes either to the network alone or to end-systems alone are in | changes either to the network alone or to end-systems alone are in | |||
progress. L4S involves a recognition that both approaches are | progress. L4S involves a recognition that both approaches are | |||
yielding diminishing returns: | yielding diminishing returns: | |||
* Recent state-of-the-art active queue management (AQM) in the | o Recent state-of-the-art active queue management (AQM) in the | |||
network, e.g. FQ-CoDel [RFC8290], PIE [RFC8033], Adaptive | network, e.g. FQ-CoDel [RFC8290], PIE [RFC8033], Adaptive | |||
RED [ARED01] ) has reduced queuing delay for all traffic, not just | RED [ARED01] ) has reduced queuing delay for all traffic, not just | |||
a select few applications. However, no matter how good the AQM, | a select few applications. However, no matter how good the AQM, | |||
the capacity-seeking (sawtoothing) rate of TCP-like congestion | the capacity-seeking (sawtoothing) rate of TCP-like congestion | |||
controls represents a lower limit that will either cause queuing | controls represents a lower limit that will either cause queuing | |||
delay to vary or cause the link to be under-utilized. These AQMs | delay to vary or cause the link to be under-utilized. These AQMs | |||
are tuned to allow a typical capacity-seeking Reno-friendly flow | are tuned to allow a typical capacity-seeking Reno-friendly flow | |||
to induce an average queue that roughly doubles the base RTT, | to induce an average queue that roughly doubles the base RTT, | |||
adding 5-15 ms of queuing on average (cf. 500 microseconds with | adding 5-15 ms of queuing on average (cf. 500 microseconds with | |||
L4S for the same mix of long-running and web traffic). However, | L4S for the same mix of long-running and web traffic). However, | |||
for many applications low delay is not useful unless it is | for many applications low delay is not useful unless it is | |||
consistently low. With these AQMs, 99th percentile queuing delay | consistently low. With these AQMs, 99th percentile queuing delay | |||
is 20-30 ms (cf. 2 ms with the same traffic over L4S). | is 20-30 ms (cf. 2 ms with the same traffic over L4S). | |||
* Similarly, recent research into using e2e congestion control | o Similarly, recent research into using e2e congestion control | |||
without needing an AQM in the network (e.g. BBR | without needing an AQM in the network (e.g. BBR | |||
[I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a | [I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a | |||
similar lower limit to queuing delay of about 20ms on average, but | similar lower limit to queuing delay of about 20ms on average, but | |||
there are also regular 25ms delay spikes due to bandwidth probes | there are also regular 25ms delay spikes due to bandwidth probes | |||
and 60ms spikes due to flow-starts. | and 60ms spikes due to flow-starts. | |||
L4S learns from the experience of Data Center TCP [RFC8257], which | L4S learns from the experience of Data Center TCP [RFC8257], which | |||
shows the power of complementary changes both in the network and on | shows the power of complementary changes both in the network and on | |||
end-systems. DCTCP teaches us that two small but radical changes to | end-systems. DCTCP teaches us that two small but radical changes to | |||
congestion control are needed to cut the two major outstanding causes | congestion control are needed to cut the two major outstanding causes | |||
skipping to change at page 7, line 14 ¶ | skipping to change at page 7, line 14 ¶ | |||
For the public Internet, nearly all the benefit will typically be | For the public Internet, nearly all the benefit will typically be | |||
achieved by deploying the Coupled AQM into either end of the access | achieved by deploying the Coupled AQM into either end of the access | |||
link between a 'site' and the Internet, which is invariably the | link between a 'site' and the Internet, which is invariably the | |||
bottleneck (see section 6.4 of[I-D.ietf-tsvwg-l4s-arch] about | bottleneck (see section 6.4 of[I-D.ietf-tsvwg-l4s-arch] about | |||
deployment, which also defines the term 'site' to mean a home, an | deployment, which also defines the term 'site' to mean a home, an | |||
office, a campus or mobile user equipment). | office, a campus or mobile user equipment). | |||
Latency is not the only concern of L4S: | Latency is not the only concern of L4S: | |||
* The "Low Loss" part of the name denotes that L4S generally | o The "Low Loss" part of the name denotes that L4S generally | |||
achieves zero congestion loss (which would otherwise cause | achieves zero congestion loss (which would otherwise cause | |||
retransmission delays), due to its use of ECN. | retransmission delays), due to its use of ECN. | |||
* The "Scalable throughput" part of the name denotes that the per- | o The "Scalable throughput" part of the name denotes that the per- | |||
flow throughput of Scalable congestion controls should scale | flow throughput of Scalable congestion controls should scale | |||
indefinitely, avoiding the imminent scaling problems with 'TCP- | indefinitely, avoiding the imminent scaling problems with 'TCP- | |||
Friendly' congestion control algorithms [RFC3649]. | Friendly' congestion control algorithms [RFC3649]. | |||
The former is clearly in scope of this AQM document. However, the | The former is clearly in scope of this AQM document. However, the | |||
latter is an outcome of the end-system behaviour, and therefore | latter is an outcome of the end-system behaviour, and therefore | |||
outside the scope of this AQM document, even though the AQM is an | outside the scope of this AQM document, even though the AQM is an | |||
enabler. | enabler. | |||
The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more | The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more | |||
detail, including on wider deployment aspects such as backwards | detail, including on wider deployment aspects such as backwards | |||
compatibility of Scalable congestion controls in bottlenecks where a | compatibility of Scalable congestion controls in bottlenecks where a | |||
DualQ Coupled AQM has not been deployed. The supporting papers | DualQ Coupled AQM has not been deployed. The supporting papers | |||
[DualPI2Linux], [PI2], [DCttH19] and [PI2param] give the full | [L4Seval22], [DualPI2Linux], [PI2] and [PI2param] give the full | |||
rationale for the AQM's design, both discursively and in more precise | rationale for the AQM's design, both discursively and in more precise | |||
mathematical form, as well as the results of performance evaluations. | mathematical form, as well as the results of performance evaluations. | |||
The main results have been validated independently when using the | The main results have been validated independently when using the | |||
Prague congestion control [Boru20] (experiments are run using Prague | Prague congestion control [Boru20] (experiments are run using Prague | |||
and DCTCP, but only the former are relevant for validation, because | and DCTCP, but only the former are relevant for validation, because | |||
Prague fixes a number of problems with the Linux DCTCP code that make | Prague fixes a number of problems with the Linux DCTCP code that make | |||
it unsuitable for the public Internet). | it unsuitable for the public Internet). | |||
1.3. Terminology | 1.3. Terminology | |||
skipping to change at page 8, line 46 ¶ | skipping to change at page 8, line 48 ¶ | |||
averages 2 congestion signals per round-trip whatever the flow | averages 2 congestion signals per round-trip whatever the flow | |||
rate, as do other recently developed scalable congestion controls, | rate, as do other recently developed scalable congestion controls, | |||
e.g. Relentless TCP [I-D.mathis-iccrg-relentless-tcp], TCP Prague | e.g. Relentless TCP [I-D.mathis-iccrg-relentless-tcp], TCP Prague | |||
[I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], | [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], | |||
BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the | BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the | |||
L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]). | L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]). | |||
For the public Internet a Scalable transport has to comply with | For the public Internet a Scalable transport has to comply with | |||
the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id] | the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id] | |||
(aka. the 'Prague L4S requirements'). | (aka. the 'Prague L4S requirements'). | |||
C: Abbreviation for Classic, e.g. when used as a subscript. | C: Abbreviation for Classic, e.g. when used as a subscript. | |||
L: Abbreviation for L4S, e.g. when used as a subscript. | L: Abbreviation for L4S, e.g. when used as a subscript. | |||
The terms Classic or L4S can also qualify other nouns, such as | The terms Classic or L4S can also qualify other nouns, such as | |||
'codepoint', 'identifier', 'classification', 'packet', 'flow'. | 'codepoint', 'identifier', 'classification', 'packet', 'flow'. | |||
For example: an L4S packet means a packet with an L4S identifier | For example: an L4S packet means a packet with an L4S identifier | |||
sent from an L4S congestion control. | sent from an L4S congestion control. | |||
Both Classic and L4S services can cope with a proportion of | Both Classic and L4S services can cope with a proportion of | |||
unresponsive or less-responsive traffic as well, but in the L4S | unresponsive or less-responsive traffic as well, but in the L4S | |||
case its rate has to be smooth enough or low enough not to build a | case its rate has to be smooth enough or low enough not to build a | |||
queue (e.g. DNS, VoIP, game sync datagrams, etc.). The DualQ | queue (e.g. DNS, VoIP, game sync datagrams, etc.). The DualQ | |||
skipping to change at page 10, line 6 ¶ | skipping to change at page 10, line 6 ¶ | |||
traffic. | traffic. | |||
Thousands of tests have been conducted in a typical fixed residential | Thousands of tests have been conducted in a typical fixed residential | |||
broadband setting. Experiments used a range of base round trip | broadband setting. Experiments used a range of base round trip | |||
delays up to 100ms and link rates up to 200 Mb/s between the data | delays up to 100ms and link rates up to 200 Mb/s between the data | |||
centre and home network, with varying amounts of background traffic | centre and home network, with varying amounts of background traffic | |||
in both queues. For every L4S packet, the AQM kept the average | in both queues. For every L4S packet, the AQM kept the average | |||
queuing delay below 1ms (or 2 packets where serialization delay | queuing delay below 1ms (or 2 packets where serialization delay | |||
exceeded 1ms on slower links), with 99th percentile no worse than | exceeded 1ms on slower links), with 99th percentile no worse than | |||
2ms. No losses at all were introduced by the L4S AQM. Details of | 2ms. No losses at all were introduced by the L4S AQM. Details of | |||
the extensive experiments are available [DualPI2Linux], [PI2], | the extensive experiments are available [L4Seval22], [DualPI2Linux]. | |||
[DCttH19]. Subjective testing using very demanding high bandwidth | Subjective testing using very demanding high bandwidth low latency | |||
low latency applications over a single shared access link is also | applications over a single shared access link is also described | |||
described in [L4Sdemo16] and summarized in the section about | in [L4Sdemo16] and summarized in the section about applications in | |||
applications in the L4S architecture [I-D.ietf-tsvwg-l4s-arch] . | the L4S architecture [I-D.ietf-tsvwg-l4s-arch] . | |||
In all these experiments, the host was connected to the home network | In all these experiments, the host was connected to the home network | |||
by fixed Ethernet, in order to quantify the queuing delay that can be | by fixed Ethernet, in order to quantify the queuing delay that can be | |||
achieved by a user who cares about delay. It should be emphasized | achieved by a user who cares about delay. It should be emphasized | |||
that L4S support at the bottleneck link cannot 'undelay' bursts | that L4S support at the bottleneck link cannot 'undelay' bursts | |||
introduced by another link on the path, for instance by legacy Wi-Fi | introduced by another link on the path, for instance by legacy Wi-Fi | |||
equipment. However, if L4S support is added to the queue feeding the | equipment. However, if L4S support is added to the queue feeding the | |||
_outgoing_ WAN link of a home gateway, it would be counterproductive | _outgoing_ WAN link of a home gateway, it would be counterproductive | |||
not to also reduce the burstiness of the _incoming_ Wi-Fi. Also, | not to also reduce the burstiness of the _incoming_ Wi-Fi. Also, | |||
trials of Wi-Fi equipment with an L4S DualQ Coupled AQM on the | trials of Wi-Fi equipment with an L4S DualQ Coupled AQM on the | |||
skipping to change at page 10, line 37 ¶ | skipping to change at page 10, line 37 ¶ | |||
achieve low delay. The L4S queue can be filled with a heavy load of | achieve low delay. The L4S queue can be filled with a heavy load of | |||
capacity-seeking flows (TCP Prague etc.) and still achieve low delay. | capacity-seeking flows (TCP Prague etc.) and still achieve low delay. | |||
The L4S queue does not rely on the presence of other traffic in the | The L4S queue does not rely on the presence of other traffic in the | |||
Classic queue that can be 'overtaken'. It gives low latency to L4S | Classic queue that can be 'overtaken'. It gives low latency to L4S | |||
traffic whether or not there is Classic traffic. The tail latency of | traffic whether or not there is Classic traffic. The tail latency of | |||
traffic served by the Classic AQM is sometimes a little better | traffic served by the Classic AQM is sometimes a little better | |||
sometimes a little worse, when a proportion of the traffic is L4S. | sometimes a little worse, when a proportion of the traffic is L4S. | |||
The two queues are only necessary because: | The two queues are only necessary because: | |||
* the large variations (sawteeth) of Classic flows need roughly a | o the large variations (sawteeth) of Classic flows need roughly a | |||
base RTT of queuing delay to ensure full utilization | base RTT of queuing delay to ensure full utilization | |||
* Scalable flows do not need a queue to keep utilization high, but | o Scalable flows do not need a queue to keep utilization high, but | |||
they cannot keep latency predictably low if they are mixed with | they cannot keep latency predictably low if they are mixed with | |||
Classic traffic, | Classic traffic, | |||
The L4S queue has latency priority within sub-round trip timescales, | The L4S queue has latency priority within sub-round trip timescales, | |||
but over longer periods the coupling from the Classic to the L4S AQM | but over longer periods the coupling from the Classic to the L4S AQM | |||
(explained below) ensures that it does not have bandwidth priority | (explained below) ensures that it does not have bandwidth priority | |||
over the Classic queue. | over the Classic queue. | |||
2. DualQ Coupled AQM | 2. DualQ Coupled AQM | |||
There are two main aspects to the approach: | There are two main aspects to the approach: | |||
* The Coupled AQM that addresses throughput equivalence between | o The Coupled AQM that addresses throughput equivalence between | |||
Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the | Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the | |||
Prague L4S requirements). | Prague L4S requirements). | |||
* The Dual Queue structure that provides latency separation for L4S | o The Dual Queue structure that provides latency separation for L4S | |||
flows to isolate them from the typically large Classic queue. | flows to isolate them from the typically large Classic queue. | |||
2.1. Coupled AQM | 2.1. Coupled AQM | |||
In the 1990s, the `TCP formula' was derived for the relationship | In the 1990s, the `TCP formula' was derived for the relationship | |||
between the steady-state congestion window, cwnd, and the drop | between the steady-state congestion window, cwnd, and the drop | |||
probability, p of standard Reno congestion control [RFC5681]. To a | probability, p of standard Reno congestion control [RFC5681]. To a | |||
first order approximation, the steady-state cwnd of Reno is inversely | first order approximation, the steady-state cwnd of Reno is inversely | |||
proportional to the square root of p. | proportional to the square root of p. | |||
skipping to change at page 15, line 28 ¶ | skipping to change at page 15, line 28 ¶ | |||
`----------'\\ | AQM |---->: ,'|`-.___.-' | `----------'\\ | AQM |---->: ,'|`-.___.-' | |||
\\ | |p' | <' | | \\ | |p' | <' | | |||
\\ `-------' (p'^2) //`' | \\ `-------' (p'^2) //`' | |||
\\ ^ | // | \\ ^ | // | |||
\\,. | v p_C // | \\,. | v p_C // | |||
< | _________ .------.// | < | _________ .------.// | |||
`\| | | | Drop |/ | `\| | | | Drop |/ | |||
Classic (C) |queue |===>|/mark | | Classic (C) |queue |===>|/mark | | |||
__|______| `------' | __|______| `------' | |||
Figure 1: DualQ Coupled AQM Schematic | ||||
Legend: ===> traffic flow; ---> control dependency. | Legend: ===> traffic flow; ---> control dependency. | |||
Figure 1: DualQ Coupled AQM Schematic | ||||
After the AQMs have applied their dropping or marking, the scheduler | After the AQMs have applied their dropping or marking, the scheduler | |||
forwards their packets to the link. Even though the scheduler gives | forwards their packets to the link. Even though the scheduler gives | |||
priority to the L queue, it is not as strong as the coupling from the | priority to the L queue, it is not as strong as the coupling from the | |||
C queue. This is because, as the C queue grows, the base AQM applies | C queue. This is because, as the C queue grows, the base AQM applies | |||
more congestion signals to L traffic (as well as C). As L flows | more congestion signals to L traffic (as well as C). As L flows | |||
reduce their rate in response, they use less than the scheduling | reduce their rate in response, they use less than the scheduling | |||
share for L traffic. So, because the scheduler is work preserving, | share for L traffic. So, because the scheduler is work preserving, | |||
it schedules any C traffic in the gaps. | it schedules any C traffic in the gaps. | |||
Giving priority to the L queue has the benefit of very low L queue | Giving priority to the L queue has the benefit of very low L queue | |||
skipping to change at page 18, line 41 ¶ | skipping to change at page 18, line 41 ¶ | |||
2.5.1.1. Requirements in Unexpected Cases | 2.5.1.1. Requirements in Unexpected Cases | |||
The flexibility to allow operator-specific classifiers (Section 2.3) | The flexibility to allow operator-specific classifiers (Section 2.3) | |||
leads to the need to specify what the AQM in each queue ought to do | leads to the need to specify what the AQM in each queue ought to do | |||
with packets that do not carry the ECN field expected for that queue. | with packets that do not carry the ECN field expected for that queue. | |||
It is expected that the AQM in each queue will inspect the ECN field | It is expected that the AQM in each queue will inspect the ECN field | |||
to determine what sort of congestion notification to signal, then it | to determine what sort of congestion notification to signal, then it | |||
will decide whether to apply congestion notification to this | will decide whether to apply congestion notification to this | |||
particular packet, as follows: | particular packet, as follows: | |||
* If a packet that does not carry an ECT(1) or CE codepoint is | o If a packet that does not carry an ECT(1) or CE codepoint is | |||
classified into the L queue: | classified into the L queue: | |||
- if the packet is ECT(0), the L AQM SHOULD apply CE-marking | * if the packet is ECT(0), the L AQM SHOULD apply CE-marking | |||
using a probability appropriate to Classic congestion control | using a probability appropriate to Classic congestion control | |||
and appropriate to the target delay in the L queue | and appropriate to the target delay in the L queue | |||
- if the packet is Not-ECT, the appropriate action depends on | * if the packet is Not-ECT, the appropriate action depends on | |||
whether some other function is protecting the L queue from | whether some other function is protecting the L queue from | |||
misbehaving flows (e.g. per-flow queue protection | misbehaving flows (e.g. per-flow queue protection | |||
[I-D.briscoe-docsis-q-protection] or latency policing): | [I-D.briscoe-docsis-q-protection] or latency policing): | |||
o If separate queue protection is provided, the L AQM SHOULD | + If separate queue protection is provided, the L AQM SHOULD | |||
ignore the packet and forward it unchanged, meaning it | ignore the packet and forward it unchanged, meaning it | |||
should not calculate whether to apply congestion | should not calculate whether to apply congestion | |||
notification and it should neither drop nor CE-mark the | notification and it should neither drop nor CE-mark the | |||
packet (for instance, the operator might classify EF traffic | packet (for instance, the operator might classify EF traffic | |||
that is unresponsive to drop into the L queue, alongside | that is unresponsive to drop into the L queue, alongside | |||
responsive L4S-ECN traffic) | responsive L4S-ECN traffic) | |||
o if separate queue protection is not provided, the L AQM | + if separate queue protection is not provided, the L AQM | |||
SHOULD apply drop using a drop probability appropriate to | SHOULD apply drop using a drop probability appropriate to | |||
Classic congestion control and appropriate to the target | Classic congestion control and appropriate to the target | |||
delay in the L queue | delay in the L queue | |||
* If a packet that carries an ECT(1) codepoint is classified into | o If a packet that carries an ECT(1) codepoint is classified into | |||
the C queue: | the C queue: | |||
- the C AQM SHOULD apply CE-marking using the coupled AQM | * the C AQM SHOULD apply CE-marking using the coupled AQM | |||
probability p_CL (= k*p'). | probability p_CL (= k*p'). | |||
The above requirements are worded as "SHOULDs", because operator- | The above requirements are worded as "SHOULDs", because operator- | |||
specific classifiers are for flexibility, by definition. Therefore, | specific classifiers are for flexibility, by definition. Therefore, | |||
alternative actions might be appropriate in the operator's specific | alternative actions might be appropriate in the operator's specific | |||
circumstances. An example would be where the operator knows that | circumstances. An example would be where the operator knows that | |||
certain legacy traffic marked with one codepoint actually has a | certain legacy traffic marked with one codepoint actually has a | |||
congestion response associated with another codepoint. | congestion response associated with another codepoint. | |||
If the DualQ Coupled AQM has detected overload, it MUST introduce | If the DualQ Coupled AQM has detected overload, it MUST introduce | |||
skipping to change at page 20, line 5 ¶ | skipping to change at page 19, line 47 ¶ | |||
2.5.2. Management Requirements | 2.5.2. Management Requirements | |||
2.5.2.1. Configuration | 2.5.2.1. Configuration | |||
By default, a DualQ Coupled AQM SHOULD NOT need any configuration for | By default, a DualQ Coupled AQM SHOULD NOT need any configuration for | |||
use at a bottleneck on the public Internet [RFC7567]. The following | use at a bottleneck on the public Internet [RFC7567]. The following | |||
parameters MAY be operator-configurable, e.g. to tune for non- | parameters MAY be operator-configurable, e.g. to tune for non- | |||
Internet settings: | Internet settings: | |||
* Optional packet classifier(s) to use in addition to the ECN field | o Optional packet classifier(s) to use in addition to the ECN field | |||
(see Section 2.3); | (see Section 2.3); | |||
* Expected typical RTT, which can be used to determine the queuing | o Expected typical RTT, which can be used to determine the queuing | |||
delay of the Classic AQM at its operating point, in order to | delay of the Classic AQM at its operating point, in order to | |||
prevent typical lone flows from under-utilizing capacity. For | prevent typical lone flows from under-utilizing capacity. For | |||
example: | example: | |||
- for the PI2 algorithm (Appendix A) the queuing delay target is | * for the PI2 algorithm (Appendix A) the queuing delay target is | |||
dependent on the typical RTT; | dependent on the typical RTT; | |||
- for the Curvy RED algorithm (Appendix B) the queuing delay at | * for the Curvy RED algorithm (Appendix B) the queuing delay at | |||
the desired operating point of the curvy ramp is configured to | the desired operating point of the curvy ramp is configured to | |||
encompass a typical RTT; | encompass a typical RTT; | |||
- if another Classic AQM was used, it would be likely to need an | * if another Classic AQM was used, it would be likely to need an | |||
operating point for the queue based on the typical RTT, and if | operating point for the queue based on the typical RTT, and if | |||
so it SHOULD be expressed in units of time. | so it SHOULD be expressed in units of time. | |||
An operating point that is manually calculated might be directly | An operating point that is manually calculated might be directly | |||
configurable instead, e.g. for links with large numbers of flows | configurable instead, e.g. for links with large numbers of flows | |||
where under-utilization by a single flow would be unlikely. | where under-utilization by a single flow would be unlikely. | |||
* Expected maximum RTT, which can be used to set the stability | o Expected maximum RTT, which can be used to set the stability | |||
parameter(s) of the Classic AQM. For example: | parameter(s) of the Classic AQM. For example: | |||
- for the PI2 algorithm (Appendix A), the gain parameters of the | * for the PI2 algorithm (Appendix A), the gain parameters of the | |||
PI algorithm depend on the maximum RTT. | PI algorithm depend on the maximum RTT. | |||
- for the Curvy RED algorithm (Appendix B) the smoothing | * for the Curvy RED algorithm (Appendix B) the smoothing | |||
parameter is chosen to filter out transients in the queue | parameter is chosen to filter out transients in the queue | |||
within a maximum RTT. | within a maximum RTT. | |||
Stability parameter(s) that are manually calculated assuming a | Stability parameter(s) that are manually calculated assuming a | |||
maximum RTT might be directly configurable instead. | maximum RTT might be directly configurable instead. | |||
* Coupling factor, k (see Appendix C.2); | o Coupling factor, k (see Appendix C.2); | |||
* A limit to the conditional priority of L4S. This is scheduler- | o A limit to the conditional priority of L4S. This is scheduler- | |||
dependent, but it SHOULD be expressed as a relation between the | dependent, but it SHOULD be expressed as a relation between the | |||
max delay of a C packet and an L packet. For example: | max delay of a C packet and an L packet. For example: | |||
- for a WRR scheduler a weight ratio between L and C of w:1 means | * for a WRR scheduler a weight ratio between L and C of w:1 means | |||
that the maximum delay to a C packet is w times that of an L | that the maximum delay to a C packet is w times that of an L | |||
packet. | packet. | |||
- for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.2.2) | * for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.2.2) | |||
a time-shift of tshift means that the maximum delay to a C | a time-shift of tshift means that the maximum delay to a C | |||
packet is tshift greater than that of an L packet. tshift could | packet is tshift greater than that of an L packet. tshift could | |||
be expressed as a multiple of the typical RTT rather than as an | be expressed as a multiple of the typical RTT rather than as an | |||
absolute delay. | absolute delay. | |||
* The maximum Classic ECN marking probability, p_Cmax, before | o The maximum Classic ECN marking probability, p_Cmax, before | |||
introducing drop. | introducing drop. | |||
2.5.2.2. Monitoring | 2.5.2.2. Monitoring | |||
An experimental DualQ Coupled AQM SHOULD allow the operator to | An experimental DualQ Coupled AQM SHOULD allow the operator to | |||
monitor each of the following operational statistics on demand, per | monitor each of the following operational statistics on demand, per | |||
queue and per configurable sample interval, for performance | queue and per configurable sample interval, for performance | |||
monitoring and perhaps also for accounting in some cases: | monitoring and perhaps also for accounting in some cases: | |||
* Bits forwarded, from which utilization can be calculated; | o Bits forwarded, from which utilization can be calculated; | |||
* Total packets in the three categories: arrived, presented to the | o Total packets in the three categories: arrived, presented to the | |||
AQM, and forwarded. The difference between the first two will | AQM, and forwarded. The difference between the first two will | |||
measure any non-AQM tail discard. The difference between the last | measure any non-AQM tail discard. The difference between the last | |||
two will measure proactive AQM discard; | two will measure proactive AQM discard; | |||
* ECN packets marked, non-ECN packets dropped, ECN packets dropped, | o ECN packets marked, non-ECN packets dropped, ECN packets dropped, | |||
which can be combined with the three total packet counts above to | which can be combined with the three total packet counts above to | |||
calculate marking and dropping probabilities; | calculate marking and dropping probabilities; | |||
* Queue delay (not including serialization delay of the head packet | o Queue delay (not including serialization delay of the head packet | |||
or medium acquisition delay) - see further notes below. | or medium acquisition delay) - see further notes below. | |||
Unlike the other statistics, queue delay cannot be captured in a | Unlike the other statistics, queue delay cannot be captured in a | |||
simple accumulating counter. Therefore, the type of queue delay | simple accumulating counter. Therefore, the type of queue delay | |||
statistics produced (mean, percentiles, etc.) will depend on | statistics produced (mean, percentiles, etc.) will depend on | |||
implementation constraints. To facilitate comparative evaluation | implementation constraints. To facilitate comparative evaluation | |||
of different implementations and approaches, an implementation | of different implementations and approaches, an implementation | |||
SHOULD allow mean and 99th percentile queue delay to be derived | SHOULD allow mean and 99th percentile queue delay to be derived | |||
(per queue per sample interval). A relatively simple way to do | (per queue per sample interval). A relatively simple way to do | |||
this would be to store a coarse-grained histogram of queue delay. | this would be to store a coarse-grained histogram of queue delay. | |||
skipping to change at page 22, line 10 ¶ | skipping to change at page 21, line 49 ¶ | |||
a sample interval, each bin would accumulate a count of the number | a sample interval, each bin would accumulate a count of the number | |||
of packets that had fallen within each range. The maximum queue | of packets that had fallen within each range. The maximum queue | |||
delay per queue per interval MAY also be recorded, to aid | delay per queue per interval MAY also be recorded, to aid | |||
diagnosis of faults and anomalous events. | diagnosis of faults and anomalous events. | |||
2.5.2.3. Anomaly Detection | 2.5.2.3. Anomaly Detection | |||
An experimental DualQ Coupled AQM SHOULD asynchronously report the | An experimental DualQ Coupled AQM SHOULD asynchronously report the | |||
following data about anomalous conditions: | following data about anomalous conditions: | |||
* Start-time and duration of overload state. | o Start-time and duration of overload state. | |||
A hysteresis mechanism SHOULD be used to prevent flapping in and | A hysteresis mechanism SHOULD be used to prevent flapping in and | |||
out of overload causing an event storm. For instance, exit from | out of overload causing an event storm. For instance, exit from | |||
overload state could trigger one report, but also latch a timer. | overload state could trigger one report, but also latch a timer. | |||
Then, during that time, if the AQM enters and exits overload state | Then, during that time, if the AQM enters and exits overload state | |||
any number of times, the duration in overload state is | any number of times, the duration in overload state is | |||
accumulated, but no new report is generated until the first time | accumulated, but no new report is generated until the first time | |||
the AQM is out of overload once the timer has expired. | the AQM is out of overload once the timer has expired. | |||
2.5.2.4. Deployment, Coexistence and Scaling | 2.5.2.4. Deployment, Coexistence and Scaling | |||
skipping to change at page 24, line 5 ¶ | skipping to change at page 23, line 38 ¶ | |||
A trade-off needs to be made between complexity and the risk of | A trade-off needs to be made between complexity and the risk of | |||
either traffic class harming the other. In overloaded conditions the | either traffic class harming the other. In overloaded conditions the | |||
higher priority L4S service will have to sacrifice some aspect of its | higher priority L4S service will have to sacrifice some aspect of its | |||
performance. Depending on the degree of overload, alternative | performance. Depending on the degree of overload, alternative | |||
solutions may relax a different factor: e.g. throughput, delay, drop. | solutions may relax a different factor: e.g. throughput, delay, drop. | |||
These choices need to be made either by the developer or by operator | These choices need to be made either by the developer or by operator | |||
policy, rather than by the IETF. Subsequent subsections discuss | policy, rather than by the IETF. Subsequent subsections discuss | |||
aspects relating to handling of different degrees of overload: | aspects relating to handling of different degrees of overload: | |||
* Unresponsive flows (L and/or C) but not overloaded, i.e. the sum | o Unresponsive flows (L and/or C) but not overloaded, i.e. the sum | |||
of unresponsive load before adding any responsive traffic is below | of unresponsive load before adding any responsive traffic is below | |||
capacity; | capacity; | |||
This case is handled by the regular Coupled DualQ (Section 2.1) | This case is handled by the regular Coupled DualQ (Section 2.1) | |||
but not discussed there. So below, Section 4.2.1 explains the | but not discussed there. So below, Section 4.2.1 explains the | |||
design goal, and how it is achieved in practice; | design goal, and how it is achieved in practice; | |||
* Unresponsive flows (L and/or C) causing persistent overload, | o Unresponsive flows (L and/or C) causing persistent overload, | |||
i.e. the sum of unresponsive load even before adding any | i.e. the sum of unresponsive load even before adding any | |||
responsive traffic persistently exceeds capacity; | responsive traffic persistently exceeds capacity; | |||
This case is not covered by the regular Coupled DualQ mechanism | This case is not covered by the regular Coupled DualQ mechanism | |||
(Section 2.1) but the last para in Section 2.5.1.1 sets out a | (Section 2.1) but the last para in Section 2.5.1.1 sets out a | |||
requirement to handle the case where ECN-capable traffic could | requirement to handle the case where ECN-capable traffic could | |||
starve non-ECN-capable traffic. Section 4.2.3 below discusses | starve non-ECN-capable traffic. Section 4.2.3 below discusses | |||
the general options and gives specific examples. | the general options and gives specific examples. | |||
* Short-term overload that lies between the 'not overloaded' and | o Short-term overload that lies between the 'not overloaded' and | |||
'persistently overloaded' cases. | 'persistently overloaded' cases. | |||
For the period before overload is deemed persistent, | For the period before overload is deemed persistent, | |||
Section 4.2.2 discusses options for more immediate mechanisms | Section 4.2.2 discusses options for more immediate mechanisms | |||
at the scheduler timescale. These prevent short-term | at the scheduler timescale. These prevent short-term | |||
starvation of the C queue by making the priority of the L queue | starvation of the C queue by making the priority of the L queue | |||
conditional, as required in Section 2.5.1. | conditional, as required in Section 2.5.1. | |||
4.2.1. Unresponsive Traffic without Overload | 4.2.1. Unresponsive Traffic without Overload | |||
When one or more L flows and/or C flows are unresponsive, but their | When one or more L flows and/or C flows are unresponsive, but their | |||
total load is within the link capacity so that they do not saturate | total load is within the link capacity so that they do not saturate | |||
the coupled marking (below 100%), the goal of a DualQ AQM is to | the coupled marking (below 100%), the goal of a DualQ AQM is to | |||
behave no worse than a single-queue AQM. | behave no worse than a single-queue AQM. | |||
Tests have shown that this is indeed the case with no additional | Tests have shown that this is indeed the case with no additional | |||
mechanism beyond the regular Coupled DualQ of Section 2.1 (see the | mechanism beyond the regular Coupled DualQ of Section 2.1 (see the | |||
results of 'overload experiments' in [DCttH19]). Perhaps counter- | results of 'overload experiments' in [L4Seval22]). Perhaps counter- | |||
intuitively, whether the unresponsive flow classifies itself into the | intuitively, whether the unresponsive flow classifies itself into the | |||
L or the C queue, the DualQ system behaves as if it has subtracted | L or the C queue, the DualQ system behaves as if it has subtracted | |||
from the overall link capacity. Then, the coupling shares out the | from the overall link capacity. Then, the coupling shares out the | |||
remaining capacity between any competing responsive flows (in either | remaining capacity between any competing responsive flows (in either | |||
queue). See also Section 4.2.2, which discusses scheduler-specific | queue). See also Section 4.2.2, which discusses scheduler-specific | |||
details. | details. | |||
4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S Throughput | 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S Throughput | |||
or Delay? | or Delay? | |||
skipping to change at page 25, line 19 ¶ | skipping to change at page 24, line 47 ¶ | |||
Section 2.5.1) to avoid short-term starvation of Classic. Otherwise, | Section 2.5.1) to avoid short-term starvation of Classic. Otherwise, | |||
as explained in Section 2.4, even a lone responsive L4S flow could | as explained in Section 2.4, even a lone responsive L4S flow could | |||
temporarily block a small finite set of C packets (e.g. an initial | temporarily block a small finite set of C packets (e.g. an initial | |||
window or DNS request). The blockage would only be brief, but it | window or DNS request). The blockage would only be brief, but it | |||
could be longer for certain AQM implementations that can only | could be longer for certain AQM implementations that can only | |||
increase the congestion signal coupled from the C queue when C | increase the congestion signal coupled from the C queue when C | |||
packets are actually being dequeued. There is then the question of | packets are actually being dequeued. There is then the question of | |||
whether to sacrifice L4S throughput or L4S delay (or some other | whether to sacrifice L4S throughput or L4S delay (or some other | |||
policy) to make the priority conditional: | policy) to make the priority conditional: | |||
Sacrifice L4S throughput: By using weighted round-robin as the | Sacrifice L4S throughput: By using weighted round-robin as the | |||
conditional priority scheduler, the L4S service can sacrifice some | conditional priority scheduler, the L4S service can sacrifice some | |||
throughput during overload. This can either be thought of as | throughput during overload. This can either be thought of as | |||
guaranteeing a minimum throughput service for Classic traffic, or | guaranteeing a minimum throughput service for Classic traffic, or | |||
as guaranteeing a maximum delay for a packet at the head of the | as guaranteeing a maximum delay for a packet at the head of the | |||
Classic queue. | Classic queue. | |||
Cautionary note: a WRR scheduler can only guarantee Classic | Cautionary note: a WRR scheduler can only guarantee Classic | |||
throughput if Classic sources are sending enough to use it -- | throughput if Classic sources are sending enough to use it -- | |||
congestion signals can undermine scheduling because they determine | congestion signals can undermine scheduling because they determine | |||
how much responsive traffic of each class arrives for scheduling | how much responsive traffic of each class arrives for scheduling | |||
skipping to change at page 28, line 30 ¶ | skipping to change at page 28, line 10 ¶ | |||
As an example, experiments with the DualPI2 AQM (Appendix A) have | As an example, experiments with the DualPI2 AQM (Appendix A) have | |||
shown that introducing 'drop on saturation' at 100% coupled L4S | shown that introducing 'drop on saturation' at 100% coupled L4S | |||
marking addresses this problem with unresponsive ECN as well as | marking addresses this problem with unresponsive ECN as well as | |||
addressing the saturation problem. At saturation, DualPI2 switches | addressing the saturation problem. At saturation, DualPI2 switches | |||
into overload mode, where the base AQM is driven by the max delay of | into overload mode, where the base AQM is driven by the max delay of | |||
both queues and it introduces probabilistic drop to both queues | both queues and it introduces probabilistic drop to both queues | |||
equally. It leaves only a small range of congestion levels just | equally. It leaves only a small range of congestion levels just | |||
below saturation where unresponsive traffic gains any advantage from | below saturation where unresponsive traffic gains any advantage from | |||
using the ECN capability (relative to being unresponsive without | using the ECN capability (relative to being unresponsive without | |||
ECN), and the advantage is hardly detectable (see [DualQ-Test] and | ECN), and the advantage is hardly detectable (see [DualQ-Test] and | |||
section IV-E of [DCttH19]. Also overload with an unresponsive ECT(1) | section IV-G of [L4Seval22]. Also overload with an unresponsive | |||
flow gets no more bandwidth advantage than with ECT(0). | ECT(1) flow gets no more bandwidth advantage than with ECT(0). | |||
5. References | 5. References | |||
5.1. Normative References | 5.1. Normative References | |||
[I-D.ietf-tsvwg-ecn-l4s-id] | [I-D.ietf-tsvwg-ecn-l4s-id] | |||
Schepper, K. D. and B. Briscoe, "Explicit Congestion | Schepper, K. and B. Briscoe, "Explicit Congestion | |||
Notification (ECN) Protocol for Very Low Queuing Delay | Notification (ECN) Protocol for Ultra-Low Queuing Delay | |||
(L4S)", Work in Progress, Internet-Draft, draft-ietf- | (L4S)", draft-ietf-tsvwg-ecn-l4s-id-14 (work in progress), | |||
tsvwg-ecn-l4s-id-28, 8 August 2022, | March 2021. | |||
<https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
ietf-tsvwg-ecn-l4s-id/>. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
<https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
skipping to change at page 29, line 32 ¶ | skipping to change at page 29, line 7 ¶ | |||
[AQMmetrics] | [AQMmetrics] | |||
Kwon, M. and S. Fahmy, "A Comparison of Load-based and | Kwon, M. and S. Fahmy, "A Comparison of Load-based and | |||
Queue- based Active Queue Management Algorithms", Proc. | Queue- based Active Queue Management Algorithms", Proc. | |||
Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: | Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: | |||
10.1117/12.473021, 2002, | 10.1117/12.473021, 2002, | |||
<https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | <https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | |||
[ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | |||
Algorithm for Increasing the Robustness of RED's Active | Algorithm for Increasing the Robustness of RED's Active | |||
Queue Management", ACIRI Technical Report , August 2001, | Queue Management", ACIRI Technical Report 301, August | |||
<https://www.icir.org/floyd/red.html>. | 2001, <http://www.icsi.berkeley.edu/icsi/node/2032>. | |||
[BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", GitHub | [BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", GitHub | |||
repository; Linux congestion control module, | repository; Linux congestion control module, | |||
<https://github.com/google/bbr/blob/v2alpha/README.md>. | <https://github.com/google/bbr/blob/v2alpha/README.md>. | |||
[Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J. | [Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J. | |||
Taheri, "Validating the Sharing Behavior and Latency | Taheri, "Validating the Sharing Behavior and Latency | |||
Characteristics of the L4S Architecture", ACM CCR | Characteristics of the L4S Architecture", ACM CCR | |||
50(2):37--44, May 2020, | 50(2):37--44, May 2020, | |||
<https://dl.acm.org/doi/abs/10.1145/3402413.3402419>. | <https://dl.acm.org/doi/abs/10.1145/3402413.3402419>. | |||
skipping to change at page 30, line 15 ¶ | skipping to change at page 29, line 37 ¶ | |||
[CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | |||
ACM Queue 10(5), May 2012, | ACM Queue 10(5), May 2012, | |||
<https://queue.acm.org/issuedetail.cfm?issue=2208917>. | <https://queue.acm.org/issuedetail.cfm?issue=2208917>. | |||
[CRED_Insights] | [CRED_Insights] | |||
Briscoe, B., "Insights from Curvy RED (Random Early | Briscoe, B., "Insights from Curvy RED (Random Early | |||
Detection)", BT Technical Report TR-TUB8-2015-003 | Detection)", BT Technical Report TR-TUB8-2015-003 | |||
arXiv:1904.07339 [cs.NI], July 2015, | arXiv:1904.07339 [cs.NI], July 2015, | |||
<https://arxiv.org/abs/1904.07339>. | <https://arxiv.org/abs/1904.07339>. | |||
[DCttH19] De Schepper, K., Bondarenko, O., Tilmans, O., and B. | ||||
Briscoe, "`Data Centre to the Home': Ultra-Low Latency for | ||||
All", Updated RITE project Technical Report , July 2019, | ||||
<https://bobbriscoe.net/pubs.html#DCttH_TR>. | ||||
[DOCSIS3.1] | [DOCSIS3.1] | |||
CableLabs, "MAC and Upper Layer Protocols Interface | CableLabs, "MAC and Upper Layer Protocols Interface | |||
(MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable | (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable | |||
Service Interface Specifications DOCSIS® 3.1 Version i17 | Service Interface Specifications DOCSIS(R) 3.1 Version i17 | |||
or later, 21 January 2019, <https://specification- | or later, January 2019, <https://specification- | |||
search.cablelabs.com/CM-SP-MULPIv3.1>. | search.cablelabs.com/CM-SP-MULPIv3.1>. | |||
[DualPI2Linux] | [DualPI2Linux] | |||
Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., | Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., | |||
and H. Steen, "DUALPI2 - Low Latency, Low Loss and | and H. Steen, "DUALPI2 - Low Latency, Low Loss and | |||
Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019, | Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019, | |||
<https://www.netdevconf.org/0x13/session.html?talk- | <https://www.netdevconf.org/0x13/session.html?talk- | |||
DUALPI2-AQM>. | DUALPI2-AQM>. | |||
[DualQ-Test] | [DualQ-Test] | |||
Steen, H., "Destruction Testing: Ultra-Low Delay using | Steen, H., "Destruction Testing: Ultra-Low Delay using | |||
Dual Queue Coupled Active Queue Management", Master's | Dual Queue Coupled Active Queue Management", Master's | |||
Thesis, Dept of Informatics, Uni Oslo , May 2017, | Thesis, Dept of Informatics, Uni Oslo , May 2017. | |||
<https://www.duo.uio.no/bitstream/handle/10852/57424/ | ||||
thesis-henrste.pdf?sequence=1>. | ||||
[Dukkipati06] | [Dukkipati06] | |||
Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is | Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is | |||
the Right Metric for Congestion Control", ACM CCR | the Right Metric for Congestion Control", ACM CCR | |||
36(1):59--62, January 2006, | 36(1):59--62, January 2006, | |||
<https://dl.acm.org/doi/10.1145/1111322.1111336>. | <https://dl.acm.org/doi/10.1145/1111322.1111336>. | |||
[Heist21] Heist, P. and J. Morton, "L4S Tests", GitHub README, | [Heist21] Heist, P. and J. Morton, "L4S Tests", GitHub README, | |||
August 2021, <https://github.com/heistp/l4s- | August 2021, <https://github.com/heistp/l4s- | |||
tests/#underutilization-with-bursty-traffic>. | tests/#underutilization-with-bursty-traffic>. | |||
[I-D.briscoe-docsis-q-protection] | [I-D.briscoe-docsis-q-protection] | |||
Briscoe, B. and G. White, "The DOCSIS(r) Queue Protection | Briscoe, B. and G. White, "Queue Protection to Preserve | |||
Algorithm to Preserve Low Latency", Work in Progress, | Low Latency", draft-briscoe-docsis-q-protection-00 (work | |||
Internet-Draft, draft-briscoe-docsis-q-protection-06, 13 | in progress), July 2019. | |||
May 2022, | ||||
<https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
briscoe-docsis-q-protection/>. | ||||
[I-D.briscoe-iccrg-prague-congestion-control] | [I-D.briscoe-iccrg-prague-congestion-control] | |||
Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague | Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague | |||
Congestion Control", Work in Progress, Internet-Draft, | Congestion Control", draft-briscoe-iccrg-prague- | |||
draft-briscoe-iccrg-prague-congestion-control-01, 11 July | congestion-control-01 (work in progress), July 2022, | |||
2022, <https://datatracker.ietf.org/api/v1/doc/document/ | <https://www.ietf.org/archive/id/draft-briscoe-iccrg- | |||
draft-briscoe-iccrg-prague-congestion-control/>. | prague-congestion-control-01.txt>. | |||
[I-D.briscoe-tsvwg-l4s-diffserv] | [I-D.briscoe-tsvwg-l4s-diffserv] | |||
Briscoe, B., "Interactions between Low Latency, Low Loss, | Briscoe, B., "Interactions between Low Latency, Low Loss, | |||
Scalable Throughput (L4S) and Differentiated Services", | Scalable Throughput (L4S) and Differentiated Services", | |||
Work in Progress, Internet-Draft, draft-briscoe-tsvwg-l4s- | draft-briscoe-tsvwg-l4s-diffserv-02 (work in progress), | |||
diffserv-02, 2 July 2018, | November 2018. | |||
<https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
briscoe-tsvwg-l4s-diffserv/>. | ||||
[I-D.cardwell-iccrg-bbr-congestion-control] | [I-D.cardwell-iccrg-bbr-congestion-control] | |||
Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V. | Cardwell, N., Cheng, Y., Yeganeh, S., and V. Jacobson, | |||
Jacobson, "BBR Congestion Control", Work in Progress, | "BBR Congestion Control", draft-cardwell-iccrg-bbr- | |||
Internet-Draft, draft-cardwell-iccrg-bbr-congestion- | congestion-control-00 (work in progress), July 2017. | |||
control-02, 7 March 2022, | ||||
<https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
cardwell-iccrg-bbr-congestion-control/>. | ||||
[I-D.ietf-tsvwg-l4s-arch] | [I-D.ietf-tsvwg-l4s-arch] | |||
Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, | Briscoe, B., Schepper, K., Bagnulo, M., and G. White, "Low | |||
"Low Latency, Low Loss, Scalable Throughput (L4S) Internet | Latency, Low Loss, Scalable Throughput (L4S) Internet | |||
Service: Architecture", Work in Progress, Internet-Draft, | Service: Architecture", draft-ietf-tsvwg-l4s-arch-08 (work | |||
draft-ietf-tsvwg-l4s-arch-19, 27 July 2022, | in progress), November 2020. | |||
<https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
ietf-tsvwg-l4s-arch/>. | ||||
[I-D.mathis-iccrg-relentless-tcp] | [I-D.mathis-iccrg-relentless-tcp] | |||
Mathis, M., "Relentless Congestion Control", Work in | Mathis, M., "Relentless Congestion Control", draft-mathis- | |||
Progress, Internet-Draft, draft-mathis-iccrg-relentless- | iccrg-relentless-tcp-00 (work in progress), March 2009. | |||
tcp-00, 4 March 2009, <https://www.ietf.org/archive/id/ | ||||
draft-mathis-iccrg-relentless-tcp-00.txt>. | [L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Oestberg, C., | |||
Johansson, I., Strand, J., Ledl, P., and D. Schnieders, | ||||
"Enabling time-critical applications over 5G with rate | ||||
adaptation", Ericsson - Deutsche Telekom White Paper BNEW- | ||||
21:025455 Uen, May 2021, <https://www.ericsson.com/en/ | ||||
reports-and-papers/white-papers/enabling-time-critical- | ||||
applications-over-5g-with-rate-adaptation>. | ||||
[L4Sdemo16] | [L4Sdemo16] | |||
Bondarenko, O., De Schepper, K., Tsang, I., and B. | Bondarenko, O., De Schepper, K., Tsang, I., and B. | |||
Briscoe, "Ultra-Low Delay for All: Live Experience, Live | Briscoe, "Ultra-Low Delay for All: Live Experience, Live | |||
Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | |||
<https//dl.acm.org/citation.cfm?doid=2910017.2910633 | <https//dl.acm.org/citation.cfm?doid=2910017.2910633 | |||
(videos of demos: | (videos of demos: | |||
https://riteproject.eu/dctth/#1511dispatchwg )>. | https://riteproject.eu/dctth/#1511dispatchwg )>. | |||
[L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C., | [L4Seval22] | |||
Johansson, I., Strand, J., Lédl, P., and D. Schnieders, | De Schepper, K., Albisser, O., Tilmans, O., and B. | |||
"Enabling time-critical applications over 5G with rate | Briscoe, "Dual Queue Coupled AQM: Deployable Very Low | |||
adaptation", Ericsson - Deutsche Telekom White Paper BNEW- | Queuing Delay for All", Preprint submitted to IEEE/ACM | |||
21:025455 Uen, May 2021, <https://www.ericsson.com/en/ | Transactions on Networking arXiv:2209.01078 [cs.NI], | |||
reports-and-papers/white-papers/enabling-time-critical- | September 2022, <https://arxiv.org/abs/2209.01078>. | |||
applications-over-5g-with-rate-adaptation>. | ||||
[Labovitz10] | [Labovitz10] | |||
Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, | Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, | |||
J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc | J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc | |||
ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010, | ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010, | |||
<https://doi.org/10.1145/1851275.1851194>. | <https://doi.org/10.1145/1851275.1851194>. | |||
[LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | |||
DOCSIS: Technology Overview", CableLabs White Paper , | DOCSIS: Technology Overview", CableLabs White Paper , | |||
February 2019, <https://cablela.bs/low-latency-docsis- | February 2019, <https://cablela.bs/low-latency-docsis- | |||
technology-overview-february-2019>. | technology-overview-february-2019>. | |||
[MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a | [MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a | |||
simple scheduling algorithm for two real-time transport | simple scheduling algorithm for two real-time transport | |||
service classes with application in the UTRAN", Proc. IEEE | service classes with application in the UTRAN", Proc. IEEE | |||
Conference on Computer Communications (INFOCOM'03) Vol.2 | Conference on Computer Communications (INFOCOM'03) Vol.2 | |||
pp.1116-1122, March 2003, | pp.1116-1122, March 2003. | |||
<https://infocom2003.ieee-infocom.org/papers/27_04.PDF>. | ||||
[PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I. | [PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I. | |||
Tsang, "PI2: A Linearized AQM for both Classic and | Tsang, "PI2: A Linearized AQM for both Classic and | |||
Scalable TCP", ACM CoNEXT'16 , December 2016, | Scalable TCP", ACM CoNEXT'16 , December 2016, | |||
<https://riteproject.files.wordpress.com/2015/10/ | <https://dl.acm.org/doi/10.1145/2999572.2999578>. | |||
pi2_conext.pdf>. | ||||
[PI2param] Briscoe, B., "PI2 Parameters", Technical Report TR-BB- | [PI2param] | |||
Briscoe, B., "PI2 Parameters", Technical Report TR-BB- | ||||
2021-001 arXiv:2107.01003 [cs.NI], July 2021, | 2021-001 arXiv:2107.01003 [cs.NI], July 2021, | |||
<https://arxiv.org/abs/2107.01003>. | <https://arxiv.org/abs/2107.01003>. | |||
[PragueLinux] | [PragueLinux] | |||
Briscoe, B., De Schepper, K., Albisser, O., Misund, J., | Briscoe, B., De Schepper, K., Albisser, O., Misund, J., | |||
Tilmans, O., Kühlewind, M., and A.S. Ahmed, "Implementing | Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing | |||
the `TCP Prague' Requirements for Low Latency Low Loss | the `TCP Prague' Requirements for Low Latency Low Loss | |||
Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , | Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , | |||
March 2019, <https://www.netdevconf.org/0x13/ | March 2019, <https://www.netdevconf.org/0x13/ | |||
session.html?talk-tcp-prague-l4s>. | session.html?talk-tcp-prague-l4s>. | |||
[RFC0970] Nagle, J., "On Packet Switches With Infinite Storage", | [RFC0970] Nagle, J., "On Packet Switches With Infinite Storage", | |||
RFC 970, DOI 10.17487/RFC0970, December 1985, | RFC 970, DOI 10.17487/RFC0970, December 1985, | |||
<https://www.rfc-editor.org/info/rfc970>. | <https://www.rfc-editor.org/info/rfc970>. | |||
[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | |||
skipping to change at page 33, line 21 ¶ | skipping to change at page 32, line 34 ¶ | |||
Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | |||
S., Wroclawski, J., and L. Zhang, "Recommendations on | S., Wroclawski, J., and L. Zhang, "Recommendations on | |||
Queue Management and Congestion Avoidance in the | Queue Management and Congestion Avoidance in the | |||
Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, | Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, | |||
<https://www.rfc-editor.org/info/rfc2309>. | <https://www.rfc-editor.org/info/rfc2309>. | |||
[RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, | [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, | |||
RFC 2914, DOI 10.17487/RFC2914, September 2000, | RFC 2914, DOI 10.17487/RFC2914, September 2000, | |||
<https://www.rfc-editor.org/info/rfc2914>. | <https://www.rfc-editor.org/info/rfc2914>. | |||
[RFC3246] Davie, B., Charny, A., Bennet, J.C.R., Benson, K., Le | [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, | |||
Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V., and D. | J., Courtney, W., Davari, S., Firoiu, V., and D. | |||
Stiliadis, "An Expedited Forwarding PHB (Per-Hop | Stiliadis, "An Expedited Forwarding PHB (Per-Hop | |||
Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, | Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, | |||
<https://www.rfc-editor.org/info/rfc3246>. | <https://www.rfc-editor.org/info/rfc3246>. | |||
[RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", | [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", | |||
RFC 3649, DOI 10.17487/RFC3649, December 2003, | RFC 3649, DOI 10.17487/RFC3649, December 2003, | |||
<https://www.rfc-editor.org/info/rfc3649>. | <https://www.rfc-editor.org/info/rfc3649>. | |||
[RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion | [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion | |||
Control Algorithms", BCP 133, RFC 5033, | Control Algorithms", BCP 133, RFC 5033, | |||
skipping to change at page 35, line 9 ¶ | skipping to change at page 34, line 19 ¶ | |||
[RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of | [RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of | |||
Pervasive Encryption on Operators", RFC 8404, | Pervasive Encryption on Operators", RFC 8404, | |||
DOI 10.17487/RFC8404, July 2018, | DOI 10.17487/RFC8404, July 2018, | |||
<https://www.rfc-editor.org/info/rfc8404>. | <https://www.rfc-editor.org/info/rfc8404>. | |||
[SCReAM] Johansson, I., "SCReAM", GitHub repository; , | [SCReAM] Johansson, I., "SCReAM", GitHub repository; , | |||
<https://github.com/EricssonResearch/scream/blob/master/ | <https://github.com/EricssonResearch/scream/blob/master/ | |||
README.md>. | README.md>. | |||
[SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics", | [SigQ-Dyn] | |||
Briscoe, B., "Rapid Signalling of Queue Dynamics", | ||||
Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI], | Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI], | |||
September 2017, <https://arxiv.org/abs/1904.07044>. | September 2017, <https://arxiv.org/abs/1904.07044>. | |||
Appendix A. Example DualQ Coupled PI2 Algorithm | Appendix A. Example DualQ Coupled PI2 Algorithm | |||
As a first concrete example, the pseudocode below gives the DualPI2 | As a first concrete example, the pseudocode below gives the DualPI2 | |||
algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | |||
framework in Figure 1. A simple ramp function (configured in units | framework in Figure 1. A simple ramp function (configured in units | |||
of queuing time) with unsmoothed ECN marking is used for the Native | of queuing time) with unsmoothed ECN marking is used for the Native | |||
L4S AQM. The ramp can also be configured as a step function. The | L4S AQM. The ramp can also be configured as a step function. The | |||
skipping to change at page 36, line 5 ¶ | skipping to change at page 35, line 13 ¶ | |||
[DualPI2Linux]. The specification of the DualQ Coupled AQM for | [DualPI2Linux]. The specification of the DualQ Coupled AQM for | |||
DOCSIS cable modems and CMTSs is available in [DOCSIS3.1] and | DOCSIS cable modems and CMTSs is available in [DOCSIS3.1] and | |||
explained in [LLD]. | explained in [LLD]. | |||
A.1. Pass #1: Core Concepts | A.1. Pass #1: Core Concepts | |||
The pseudocode manipulates three main structures of variables: the | The pseudocode manipulates three main structures of variables: the | |||
packet (pkt), the L4S queue (lq) and the Classic queue (cq). The | packet (pkt), the L4S queue (lq) and the Classic queue (cq). The | |||
pseudocode consists of the following six functions: | pseudocode consists of the following six functions: | |||
* The initialization function dualpi2_params_init(...) (Figure 2) | o The initialization function dualpi2_params_init(...) (Figure 2) | |||
that sets parameter defaults (the API for setting non-default | that sets parameter defaults (the API for setting non-default | |||
values is omitted for brevity) | values is omitted for brevity) | |||
* The enqueue function dualpi2_enqueue(lq, cq, pkt) (Figure 3) | o The enqueue function dualpi2_enqueue(lq, cq, pkt) (Figure 3) | |||
* The dequeue function dualpi2_dequeue(lq, cq, pkt) (Figure 4) | o The dequeue function dualpi2_dequeue(lq, cq, pkt) (Figure 4) | |||
* The recurrence function recur(q, likelihood) for de-randomized ECN | o The recurrence function recur(q, likelihood) for de-randomized ECN | |||
marking (shown at the end of Figure 4). | marking (shown at the end of Figure 4). | |||
* The L4S AQM function laqm(qdelay) (Figure 5) used to calculate the | o The L4S AQM function laqm(qdelay) (Figure 5) used to calculate the | |||
ECN-marking probability for the L4S queue | ECN-marking probability for the L4S queue | |||
* The base AQM function that implements the PI algorithm | o The base AQM function that implements the PI algorithm | |||
dualpi2_update(lq, cq) (Figure 6) used to regularly update the | dualpi2_update(lq, cq) (Figure 6) used to regularly update the | |||
base probability (p'), which is squared for the Classic AQM as | base probability (p'), which is squared for the Classic AQM as | |||
well as being coupled across to the L4S queue. | well as being coupled across to the L4S queue. | |||
It also uses the following functions that are not shown in full here: | It also uses the following functions that are not shown in full here: | |||
* scheduler(), which selects between the head packets of the two | o scheduler(), which selects between the head packets of the two | |||
queues; the choice of scheduler technology is discussed later; | queues; the choice of scheduler technology is discussed later; | |||
* cq.byt() or lq.byt() returns the current length (aka. backlog) of | o cq.byt() or lq.byt() returns the current length (aka. backlog) of | |||
the relevant queue in bytes; | the relevant queue in bytes; | |||
* cq.len() or lq.len() returns the current length of the relevant | o cq.len() or lq.len() returns the current length of the relevant | |||
queue in packets; | queue in packets; | |||
* cq.time() or lq.time() returns the current queuing delay of the | o cq.time() or lq.time() returns the current queuing delay of the | |||
relevant queue in units of time (see Note a); | relevant queue in units of time (see Note a); | |||
* mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | |||
In experiments so far (building on experiments with PIE) on broadband | In experiments so far (building on experiments with PIE) on broadband | |||
access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms | access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms | |||
to 100 ms, DualPI2 achieves good results with the default parameters | to 100 ms, DualPI2 achieves good results with the default parameters | |||
in Figure 2. The parameters are categorised by whether they relate | in Figure 2. The parameters are categorised by whether they relate | |||
to the Base PI2 AQM, the L4S AQM or the framework coupling them | to the Base PI2 AQM, the L4S AQM or the framework coupling them | |||
together. Constants and variables derived from these parameters are | together. Constants and variables derived from these parameters are | |||
also included at the end of each category. Each parameter is | also included at the end of each category. Each parameter is | |||
explained as it is encountered in the walk-through of the pseudocode | explained as it is encountered in the walk-through of the pseudocode | |||
below, and the rationale for the chosen defaults are given so that | below, and the rationale for the chosen defaults are given so that | |||
skipping to change at page 38, line 16 ¶ | skipping to change at page 37, line 16 ¶ | |||
2: if ( lq.byt() + cq.byt() + MTU > limit) | 2: if ( lq.byt() + cq.byt() + MTU > limit) | |||
3: drop(pkt) % drop packet if buffer is full | 3: drop(pkt) % drop packet if buffer is full | |||
4: timestamp(pkt) % only needed if using the sojourn technique | 4: timestamp(pkt) % only needed if using the sojourn technique | |||
5: % Packet classifier | 5: % Packet classifier | |||
6: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE | 6: if ( ecn(pkt) modulo 2 == 1 ) % ECN bits = ECT(1) or CE | |||
7: lq.enqueue(pkt) | 7: lq.enqueue(pkt) | |||
8: else % ECN bits = not-ECT or ECT(0) | 8: else % ECN bits = not-ECT or ECT(0) | |||
9: cq.enqueue(pkt) | 9: cq.enqueue(pkt) | |||
10: } | 10: } | |||
Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM | Figure 3: Example Enqueue Pseudocode for DualQ Coupled PI2 AQM | |||
1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
2: while ( lq.byt() + cq.byt() > 0 ) { | 2: while ( lq.byt() + cq.byt() > 0 ) { | |||
3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
4: lq.dequeue(pkt) % Scheduler chooses lq | 4: lq.dequeue(pkt) % Scheduler chooses lq | |||
5: p'_L = laqm(lq.time()) % Native LAQM | 5: p'_L = laqm(lq.time()) % Native LAQM | |||
6: p_L = max(p'_L, p_CL) % Combining function | 6: p_L = max(p'_L, p_CL) % Combining function | |||
7: if ( recur(lq, p_L) ) % Linear marking | 7: if ( recur(lq, p_L) ) % Linear marking | |||
8: mark(pkt) | 8: mark(pkt) | |||
9: } else { | 9: } else { | |||
skipping to change at page 38, line 50 ¶ | skipping to change at page 37, line 50 ¶ | |||
23: recur(q, likelihood) { % Returns TRUE with a certain likelihood | 23: recur(q, likelihood) { % Returns TRUE with a certain likelihood | |||
24: q.count += likelihood | 24: q.count += likelihood | |||
25: if (q.count > 1) { | 25: if (q.count > 1) { | |||
26: q.count -= 1 | 26: q.count -= 1 | |||
27: return TRUE | 27: return TRUE | |||
28: } | 28: } | |||
29: return FALSE | 29: return FALSE | |||
30: } | 30: } | |||
Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | |||
When packets arrive, first a common queue limit is checked as shown | When packets arrive, first a common queue limit is checked as shown | |||
in line 2 of the enqueuing pseudocode in Figure 3. This assumes a | in line 2 of the enqueuing pseudocode in Figure 3. This assumes a | |||
shared buffer for the two queues (Note b discusses the merits of | shared buffer for the two queues (Note b discusses the merits of | |||
separate buffers). In order to avoid any bias against larger | separate buffers). In order to avoid any bias against larger | |||
packets, 1 MTU of space is always allowed, and the limit is | packets, 1 MTU of space is always allowed, and the limit is | |||
deliberately tested before enqueue. | deliberately tested before enqueue. | |||
If limit is not exceeded, the packet is timestamped in line 4 (only | If limit is not exceeded, the packet is timestamped in line 4 (only | |||
if the sojourn time technique is being used to measure queue delay; | if the sojourn time technique is being used to measure queue delay; | |||
skipping to change at page 39, line 43 ¶ | skipping to change at page 38, line 43 ¶ | |||
loop sloppier (for a typical RTT it would double the Classic queue's | loop sloppier (for a typical RTT it would double the Classic queue's | |||
feedback delay). | feedback delay). | |||
All the dequeue code is contained within a large while loop so that | All the dequeue code is contained within a large while loop so that | |||
if it decides to drop a packet, it will continue until it selects a | if it decides to drop a packet, it will continue until it selects a | |||
packet to schedule. Line 3 of the dequeue pseudocode is where the | packet to schedule. Line 3 of the dequeue pseudocode is where the | |||
scheduler chooses between the L4S queue (lq) and the Classic queue | scheduler chooses between the L4S queue (lq) and the Classic queue | |||
(cq). Detailed implementation of the scheduler is not shown (see | (cq). Detailed implementation of the scheduler is not shown (see | |||
discussion later). | discussion later). | |||
* If an L4S packet is scheduled, in lines 7 and 8 the packet is ECN- | o If an L4S packet is scheduled, in lines 7 and 8 the packet is ECN- | |||
marked with likelihood p_L. The recur() function at the end of | marked with likelihood p_L. The recur() function at the end of | |||
Figure 4 is used, which is preferred over random marking because | Figure 4 is used, which is preferred over random marking because | |||
it avoids delay due to randomization when interpreting congestion | it avoids delay due to randomization when interpreting congestion | |||
signals, but it still desynchronizes the saw-teeth of the flows. | signals, but it still desynchronizes the saw-teeth of the flows. | |||
Line 6 calculates p_L as the maximum of the coupled L4S | Line 6 calculates p_L as the maximum of the coupled L4S | |||
probability p_CL and the probability from the native L4S AQM p'_L. | probability p_CL and the probability from the native L4S AQM p'_L. | |||
This implements the max() function shown in Figure 1 to couple the | This implements the max() function shown in Figure 1 to couple the | |||
outputs of the two AQMs together. Of the two probabilities input | outputs of the two AQMs together. Of the two probabilities input | |||
to p_L in line 6: | to p_L in line 6: | |||
- p'_L is calculated per packet in line 5 by the laqm() function | * p'_L is calculated per packet in line 5 by the laqm() function | |||
(see Figure 5), | (see Figure 5), | |||
- Whereas p_CL is maintained by the dualpi2_update() function | * Whereas p_CL is maintained by the dualpi2_update() function | |||
which runs every Tupdate (Tupdate is set in line 12 of | which runs every Tupdate (Tupdate is set in line 12 of | |||
Figure 2). | Figure 2). | |||
* If a Classic packet is scheduled, lines 10 to 17 drop or mark the | o If a Classic packet is scheduled, lines 10 to 17 drop or mark the | |||
packet with probability p_C. | packet with probability p_C. | |||
The Native L4S AQM algorithm (Figure 5) is a ramp function, similar | The Native L4S AQM algorithm (Figure 5) is a ramp function, similar | |||
to the RED algorithm, but simplified as follows: | to the RED algorithm, but simplified as follows: | |||
* The extent of the ramp is defined in units of queuing delay, not | o The extent of the ramp is defined in units of queuing delay, not | |||
bytes, so that configuration remains invariant as the queue | bytes, so that configuration remains invariant as the queue | |||
departure rate varies. | departure rate varies. | |||
* It uses instantaneous queueing delay, which avoids the complexity | o It uses instantaneous queueing delay, which avoids the complexity | |||
of smoothing, but also avoids embedding a worst-case RTT of | of smoothing, but also avoids embedding a worst-case RTT of | |||
smoothing delay in the network (see Section 2.1). | smoothing delay in the network (see Section 2.1). | |||
* The ramp rises linearly directly from 0 to 1, not to an | o The ramp rises linearly directly from 0 to 1, not to an | |||
intermediate value of p'_L as RED would, because there is no need | intermediate value of p'_L as RED would, because there is no need | |||
to keep ECN marking probability low. | to keep ECN marking probability low. | |||
* Marking does not have to be randomized. Determinism is used | o Marking does not have to be randomized. Determinism is used | |||
instead of randomness; to reduce the delay necessary to smooth out | instead of randomness; to reduce the delay necessary to smooth out | |||
the noise of randomness from the signal. | the noise of randomness from the signal. | |||
The ramp function requires two configuration parameters, the minimum | The ramp function requires two configuration parameters, the minimum | |||
threshold (minTh) and the width of the ramp (range), both in units of | threshold (minTh) and the width of the ramp (range), both in units of | |||
queuing time, as shown in lines 17 & 18 of the initialization | queuing time, as shown in lines 17 & 18 of the initialization | |||
function in Figure 2. The ramp function can be configured as a step | function in Figure 2. The ramp function can be configured as a step | |||
(see Note c). | (see Note c). | |||
Although the DCTCP paper [Alizadeh-stability] recommends an ECN | Although the DCTCP paper [Alizadeh-stability] recommends an ECN | |||
skipping to change at page 41, line 24 ¶ | skipping to change at page 40, line 24 ¶ | |||
Figure 5: Example Pseudocode for the Native L4S AQM | Figure 5: Example Pseudocode for the Native L4S AQM | |||
1: dualpi2_update(lq, cq) { % Update p' every Tupdate | 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | |||
2: curq = cq.time() % use queuing time of first-in Classic packet | 2: curq = cq.time() % use queuing time of first-in Classic packet | |||
3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | |||
4: p_CL = k * p' % Coupled L4S prob = base prob * coupling factor | 4: p_CL = k * p' % Coupled L4S prob = base prob * coupling factor | |||
5: p_C = p'^2 % Classic prob = (base prob)^2 | 5: p_C = p'^2 % Classic prob = (base prob)^2 | |||
6: prevq = curq | 6: prevq = curq | |||
7: } | 7: } | |||
Figure 6: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | ||||
(Clamping p' within the range [0,1] omitted for clarity - see text) | (Clamping p' within the range [0,1] omitted for clarity - see text) | |||
Figure 6: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | ||||
The coupled marking probability, p_CL depends on the base probability | The coupled marking probability, p_CL depends on the base probability | |||
(p'), which is kept up to date by the core PI algorithm in Figure 6 | (p'), which is kept up to date by the core PI algorithm in Figure 6 | |||
executed every Tupdate. | executed every Tupdate. | |||
Note that p' solely depends on the queuing time in the Classic queue. | Note that p' solely depends on the queuing time in the Classic queue. | |||
In line 2, the current queuing delay (curq) is evaluated from how | In line 2, the current queuing delay (curq) is evaluated from how | |||
long the head packet was in the Classic queue (cq). The function | long the head packet was in the Classic queue (cq). The function | |||
cq.time() (not shown) subtracts the time stamped at enqueue from the | cq.time() (not shown) subtracts the time stamped at enqueue from the | |||
current time (see Note a) and implicitly takes the current queuing | current time (see Note a) and implicitly takes the current queuing | |||
delay as 0 if the queue is empty. | delay as 0 if the queue is empty. | |||
skipping to change at page 42, line 13 ¶ | skipping to change at page 41, line 12 ¶ | |||
[PI2param], the target queuing delay on line 9 of Figure 2 is related | [PI2param], the target queuing delay on line 9 of Figure 2 is related | |||
to the typical base RTT worldwide, RTT_typ, by two factors: target = | to the typical base RTT worldwide, RTT_typ, by two factors: target = | |||
RTT_typ * g * f. Below we summarize the rationale behind these | RTT_typ * g * f. Below we summarize the rationale behind these | |||
factors and introduce a further adjustment. The two factors ensure | factors and introduce a further adjustment. The two factors ensure | |||
that, in a large proportion of cases (say 90%), the sawtooth | that, in a large proportion of cases (say 90%), the sawtooth | |||
variations in RTT of a single flow will fit within the buffer without | variations in RTT of a single flow will fit within the buffer without | |||
underutilizing the link. Frankly, these factors are educated | underutilizing the link. Frankly, these factors are educated | |||
guesses, but with the emphasis closer to 'educated' than to 'guess' | guesses, but with the emphasis closer to 'educated' than to 'guess' | |||
(see [PI2param] for full background): | (see [PI2param] for full background): | |||
* RTT_typ is taken as 25 ms. This is based on an average CDN | o RTT_typ is taken as 25 ms. This is based on an average CDN | |||
latency measured in each country weighted by the number of | latency measured in each country weighted by the number of | |||
Internet users in that country to produce an overall weighted | Internet users in that country to produce an overall weighted | |||
average for the Internet [PI2param]. Countries were ranked by | average for the Internet [PI2param]. Countries were ranked by | |||
number of Internet users, and once 90% of Internet users were | number of Internet users, and once 90% of Internet users were | |||
covered, smaller countries were excluded to avoid | covered, smaller countries were excluded to avoid | |||
unrepresentatively small sample sizes. Also, importantly, the | unrepresentatively small sample sizes. Also, importantly, the | |||
data for the average CDN latency in China (with the largest number | data for the average CDN latency in China (with the largest number | |||
of Internet users) has been removed, because the CDN latency was a | of Internet users) has been removed, because the CDN latency was a | |||
significant outlier and, on reflection, the experimental technique | significant outlier and, on reflection, the experimental technique | |||
seemed inappropriate to the CDN market in China. | seemed inappropriate to the CDN market in China. | |||
* g is taken as 0.38. The factor g is a geometry factor that | o g is taken as 0.38. The factor g is a geometry factor that | |||
characterizes the shape of the sawteeth of prevalent Classic | characterizes the shape of the sawteeth of prevalent Classic | |||
congestion controllers. The geometry factor is the fraction of | congestion controllers. The geometry factor is the fraction of | |||
the amplitude of the sawtooth variability in queue delay that lies | the amplitude of the sawtooth variability in queue delay that lies | |||
below the AQM's target. For instance, at low bit rate, the | below the AQM's target. For instance, at low bit rate, the | |||
geometry factor of standard Reno is 0.5, but at higher rates it | geometry factor of standard Reno is 0.5, but at higher rates it | |||
tends to just under 1. According to the census of congestion | tends to just under 1. According to the census of congestion | |||
controllers conducted by Mishra et al. in Jul-Oct | controllers conducted by Mishra et al. in Jul-Oct | |||
2019 [CCcensus19], most Classic TCP traffic uses Cubic. And, | 2019 [CCcensus19], most Classic TCP traffic uses Cubic. And, | |||
according to the analysis in [PI2param], if running over a PI2 | according to the analysis in [PI2param], if running over a PI2 | |||
AQM, a large proportion of this Cubic traffic would be in its | AQM, a large proportion of this Cubic traffic would be in its | |||
Reno-Friendly mode, which has a geometry factor of ~0.39 (all | Reno-Friendly mode, which has a geometry factor of ~0.39 (all | |||
known implementations). The rest of the Cubic traffic would be in | known implementations). The rest of the Cubic traffic would be in | |||
true Cubic mode, which has a geometry factor of ~0.36. Without | true Cubic mode, which has a geometry factor of ~0.36. Without | |||
modelling the sawtooth profiles from all the other less prevalent | modelling the sawtooth profiles from all the other less prevalent | |||
congestion controllers, we estimate a 7:3 weighted average of | congestion controllers, we estimate a 7:3 weighted average of | |||
these two, resulting in an average geometry factor of 0.38. | these two, resulting in an average geometry factor of 0.38. | |||
* f is taken as 2. The factor f is a safety factor that increases | o f is taken as 2. The factor f is a safety factor that increases | |||
the target queue to allow for the distribution of RTT_typ around | the target queue to allow for the distribution of RTT_typ around | |||
its mean. Otherwise, the target queue would only avoid | its mean. Otherwise, the target queue would only avoid | |||
underutilization for those users below the mean. It also provides | underutilization for those users below the mean. It also provides | |||
a safety margin for the proportion of paths in use that span | a safety margin for the proportion of paths in use that span | |||
beyond the distance between a user and their local CDN. | beyond the distance between a user and their local CDN. | |||
Currently, no data is available on the variance of queue delay | Currently, no data is available on the variance of queue delay | |||
around the mean in each region, so there is plenty of room for | around the mean in each region, so there is plenty of room for | |||
this guess to become more educated. | this guess to become more educated. | |||
* [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 = | o [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 = | |||
19 ms. However, a further adjustment is warranted, because target | 19 ms. However, a further adjustment is warranted, because target | |||
is moving year-on-year. The paper is based on data collected in | is moving year-on-year. The paper is based on data collected in | |||
2019, and it mentions evidence from speedtest.net that suggests | 2019, and it mentions evidence from speedtest.net that suggests | |||
RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and | RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and | |||
2021. Therefore, we recommend a default of target = 15 ms at the | 2021. Therefore, we recommend a default of target = 15 ms at the | |||
time of writing (2021). | time of writing (2021). | |||
Operators can always use the data and discussion in [PI2param] to | Operators can always use the data and discussion in [PI2param] to | |||
configure a more appropriate target for their environment. For | configure a more appropriate target for their environment. For | |||
instance, an operator might wish to question the assumptions called | instance, an operator might wish to question the assumptions called | |||
skipping to change at page 45, line 5 ¶ | skipping to change at page 43, line 48 ¶ | |||
Tupdate dependent on p'. Instead, in PI2, alpha and beta are | Tupdate dependent on p'. Instead, in PI2, alpha and beta are | |||
independent of p' because the squaring applied to Classic traffic | independent of p' because the squaring applied to Classic traffic | |||
tunes them inherently. This is explained in [PI2], which also | tunes them inherently. This is explained in [PI2], which also | |||
explains why this more principled approach removes the need for most | explains why this more principled approach removes the need for most | |||
of the heuristics that had to be added to PIE. | of the heuristics that had to be added to PIE. | |||
Nonetheless, an implementer might wish to add selected details to | Nonetheless, an implementer might wish to add selected details to | |||
either AQM. For instance the Linux reference DualPI2 implementation | either AQM. For instance the Linux reference DualPI2 implementation | |||
includes the following (not shown in the pseudocode above): | includes the following (not shown in the pseudocode above): | |||
* Classic and coupled marking or dropping (i.e. based on p_C and | o Classic and coupled marking or dropping (i.e. based on p_C and | |||
p_CL from the PI controller) is not applied to a packet if the | p_CL from the PI controller) is not applied to a packet if the | |||
aggregate queue length in bytes is < 2 MTU (prior to enqueuing the | aggregate queue length in bytes is < 2 MTU (prior to enqueuing the | |||
packet or dequeuing it, depending on whether the AQM is configured | packet or dequeuing it, depending on whether the AQM is configured | |||
to be applied at enqueue or dequeue); | to be applied at enqueue or dequeue); | |||
* In the WRR scheduler, the 'credit' indicating which queue should | o In the WRR scheduler, the 'credit' indicating which queue should | |||
transmit is only changed if there are packets in both queues | transmit is only changed if there are packets in both queues | |||
(i.e. if there is actual resource contention). This means that a | (i.e. if there is actual resource contention). This means that a | |||
properly paced L flow might never be delayed by the WRR. The WRR | properly paced L flow might never be delayed by the WRR. The WRR | |||
credit is reset in favour of the L queue when the link is idle. | credit is reset in favour of the L queue when the link is idle. | |||
An implementer might also wish to add other heuristics, e.g. burst | An implementer might also wish to add other heuristics, e.g. burst | |||
protection [RFC8033] or enhanced burst protection [RFC8034]. | protection [RFC8033] or enhanced burst protection [RFC8034]. | |||
Notes: | Notes: | |||
skipping to change at page 47, line 23 ¶ | skipping to change at page 46, line 20 ¶ | |||
The pseudocode given here applies where the link rate is unknown, | The pseudocode given here applies where the link rate is unknown, | |||
which is more common for software implementations that might be | which is more common for software implementations that might be | |||
deployed in scenarios where the link is shared with other queues. In | deployed in scenarios where the link is shared with other queues. In | |||
lines 5a to 5d in Figure 7 the native L4S marking probability, p'_L, | lines 5a to 5d in Figure 7 the native L4S marking probability, p'_L, | |||
is zeroed if the queue is only 1 packet (in the default | is zeroed if the queue is only 1 packet (in the default | |||
configuration). | configuration). | |||
Linux implementation note: | Linux implementation note: | |||
* In Linux, the check that the queue exceeds Th_len before marking | o In Linux, the check that the queue exceeds Th_len before marking | |||
with the native L4S AQM is actually at enqueue, not dequeue, | with the native L4S AQM is actually at enqueue, not dequeue, | |||
otherwise it would exempt the last packet of a burst from being | otherwise it would exempt the last packet of a burst from being | |||
marked. The result of the check is conveyed from enqueue to the | marked. The result of the check is conveyed from enqueue to the | |||
dequeue function via a boolean in the packet metadata. | dequeue function via a boolean in the packet metadata. | |||
Persistent overload is deemed to have occurred when Classic drop/ | Persistent overload is deemed to have occurred when Classic drop/ | |||
marking probability reaches p_Cmax. Above this point, the Classic | marking probability reaches p_Cmax. Above this point, the Classic | |||
drop probability is applied to both L and C queues, irrespective of | drop probability is applied to both L and C queues, irrespective of | |||
whether any packet is ECN-capable. ECT packets that are not dropped | whether any packet is ECN-capable. ECT packets that are not dropped | |||
can still be ECN-marked. | can still be ECN-marked. | |||
skipping to change at page 49, line 41 ¶ | skipping to change at page 48, line 41 ¶ | |||
14: continue % continue to the top of the while loop | 14: continue % continue to the top of the while loop | |||
15: } | 15: } | |||
16: mark(pkt) % squared mark | 16: mark(pkt) % squared mark | |||
17: } | 17: } | |||
18: } | 18: } | |||
19: return(pkt) % return the packet and stop | 19: return(pkt) % return the packet and stop | |||
20: } | 20: } | |||
21: return(NULL) % no packet to dequeue | 21: return(NULL) % no packet to dequeue | |||
22: } | 22: } | |||
Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | |||
(Including Code for Edge-Cases) | (Including Code for Edge-Cases) | |||
1: dualpi2_update(lq, cq) { % Update p' every Tupdate | 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | |||
2a: curq = max(cq.time(), lq.time()) % use greatest queuing time | 2a: curq = max(cq.time(), lq.time()) % use greatest queuing time | |||
3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | |||
4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | |||
5: p_C = p'^2 % Classic prob = (base prob)^2 | 5: p_C = p'^2 % Classic prob = (base prob)^2 | |||
6: prevq = curq | 6: prevq = curq | |||
7: } | 7: } | |||
Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | ||||
Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | ||||
(Including Overload Code) | (Including Overload Code) | |||
The choice of scheduler technology is critical to overload protection | The choice of scheduler technology is critical to overload protection | |||
(see Section 4.2.2). | (see Section 4.2.2). | |||
* A well-understood weighted scheduler such as weighted round-robin | o A well-understood weighted scheduler such as weighted round-robin | |||
(WRR) is recommended. As long as the scheduler weight for Classic | (WRR) is recommended. As long as the scheduler weight for Classic | |||
is small (e.g. 1/16), its exact value is unimportant because it | is small (e.g. 1/16), its exact value is unimportant because it | |||
does not normally determine capacity shares. The weight is only | does not normally determine capacity shares. The weight is only | |||
important to prevent unresponsive L4S traffic starving Classic | important to prevent unresponsive L4S traffic starving Classic | |||
traffic in the short term (see Section 4.2.2). This is because | traffic in the short term (see Section 4.2.2). This is because | |||
capacity sharing between the queues is normally determined by the | capacity sharing between the queues is normally determined by the | |||
coupled congestion signal, which overrides the scheduler, by | coupled congestion signal, which overrides the scheduler, by | |||
making L4S sources leave roughly equal per-flow capacity available | making L4S sources leave roughly equal per-flow capacity available | |||
for Classic flows. | for Classic flows. | |||
* Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It | o Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It | |||
works by selecting the head packet that has waited the longest, | works by selecting the head packet that has waited the longest, | |||
biased against the Classic traffic by a time-shift of tshift. To | biased against the Classic traffic by a time-shift of tshift. To | |||
implement time-shifted FIFO, the scheduler() function in line 3 of | implement time-shifted FIFO, the scheduler() function in line 3 of | |||
the dequeue code would simply be implemented as the scheduler() | the dequeue code would simply be implemented as the scheduler() | |||
function at the bottom of Figure 10 in Appendix B. For the public | function at the bottom of Figure 10 in Appendix B. For the public | |||
Internet a good value for tshift is 50ms. For private networks | Internet a good value for tshift is 50ms. For private networks | |||
with smaller diameter, about 4*target would be reasonable. TS- | with smaller diameter, about 4*target would be reasonable. TS- | |||
FIFO is a very simple scheduler, but complexity might need to be | FIFO is a very simple scheduler, but complexity might need to be | |||
added to address some deficiencies (which is why it is not | added to address some deficiencies (which is why it is not | |||
recommended over WRR): | recommended over WRR): | |||
- TS-FIFO does not fully isolate latency in the L4S queue from | * TS-FIFO does not fully isolate latency in the L4S queue from | |||
uncontrolled bursts in the Classic queue; | uncontrolled bursts in the Classic queue; | |||
- Using sojourn time for TS-FIFO is only appropriate if time- | * Using sojourn time for TS-FIFO is only appropriate if time- | |||
stamping of packets is feasible; | stamping of packets is feasible; | |||
- Even if time-stamping is supported, the sojourn time of the | * Even if time-stamping is supported, the sojourn time of the | |||
head packet is always stale, so a more instantaneous measure of | head packet is always stale, so a more instantaneous measure of | |||
queue delay could be used (see Note a in Appendix A.1). | queue delay could be used (see Note a in Appendix A.1). | |||
* A strict priority scheduler would be inappropriate as discussed in | o A strict priority scheduler would be inappropriate as discussed in | |||
Section 4.2.2. | Section 4.2.2. | |||
Appendix B. Example DualQ Coupled Curvy RED Algorithm | Appendix B. Example DualQ Coupled Curvy RED Algorithm | |||
As another example of a DualQ Coupled AQM algorithm, the pseudocode | As another example of a DualQ Coupled AQM algorithm, the pseudocode | |||
below gives the Curvy RED based algorithm. Although the AQM was | below gives the Curvy RED based algorithm. Although the AQM was | |||
designed to be efficient in integer arithmetic, to aid understanding | designed to be efficient in integer arithmetic, to aid understanding | |||
it is first given using floating point arithmetic (Figure 10). Then, | it is first given using floating point arithmetic (Figure 10). Then, | |||
one possible optimization for integer arithmetic is given, also in | one possible optimization for integer arithmetic is given, also in | |||
pseudocode (Figure 11). To aid comparison, the line numbers are kept | pseudocode (Figure 11). To aid comparison, the line numbers are kept | |||
in step between the two by using letter suffixes where the longer | in step between the two by using letter suffixes where the longer | |||
code needs extra lines. | code needs extra lines. | |||
B.1. Curvy RED in Pseudocode | B.1. Curvy RED in Pseudocode | |||
The pseudocode manipulates three main structures of variables: the | The pseudocode manipulates three main structures of variables: the | |||
packet (pkt), the L4S queue (lq) and the Classic queue (cq) and | packet (pkt), the L4S queue (lq) and the Classic queue (cq) and | |||
consists of the following five functions: | consists of the following five functions: | |||
* The initialization function cred_params_init(...) (Figure 2) that | o The initialization function cred_params_init(...) (Figure 2) that | |||
sets parameter defaults (the API for setting non-default values is | sets parameter defaults (the API for setting non-default values is | |||
omitted for brevity); | omitted for brevity); | |||
* The dequeue function cred_dequeue(lq, cq, pkt) (Figure 4); | o The dequeue function cred_dequeue(lq, cq, pkt) (Figure 4); | |||
* The scheduling function scheduler(), which selects between the | o The scheduling function scheduler(), which selects between the | |||
head packets of the two queues. | head packets of the two queues. | |||
It also uses the following functions that are either shown elsewhere, | It also uses the following functions that are either shown elsewhere, | |||
or not shown in full here: | or not shown in full here: | |||
* The enqueue function, which is identical to that used for DualPI2, | o The enqueue function, which is identical to that used for DualPI2, | |||
dualpi2_enqueue(lq, cq, pkt) in Figure 3; | dualpi2_enqueue(lq, cq, pkt) in Figure 3; | |||
* mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | o mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | |||
* cq.byt() or lq.byt() returns the current length (aka. backlog) of | o cq.byt() or lq.byt() returns the current length (aka. backlog) of | |||
the relevant queue in bytes; | the relevant queue in bytes; | |||
* cq.time() or lq.time() returns the current queuing delay of the | o cq.time() or lq.time() returns the current queuing delay of the | |||
relevant queue in units of time (see Note a in Appendix A.1). | relevant queue in units of time (see Note a in Appendix A.1). | |||
Because Curvy RED was evaluated before DualPI2, certain improvements | Because Curvy RED was evaluated before DualPI2, certain improvements | |||
introduced for DualPI2 were not evaluated for Curvy RED. In the | introduced for DualPI2 were not evaluated for Curvy RED. In the | |||
pseudocode below, the straightforward improvements have been added on | pseudocode below, the straightforward improvements have been added on | |||
the assumption they will provide similar benefits, but that has not | the assumption they will provide similar benefits, but that has not | |||
been proven experimentally. They are: i) a conditional priority | been proven experimentally. They are: i) a conditional priority | |||
scheduler instead of strict priority ii) a time-based threshold for | scheduler instead of strict priority ii) a time-based threshold for | |||
the native L4S AQM; iii) ECN support for the Classic AQM. A recent | the native L4S AQM; iii) ECN support for the Classic AQM. A recent | |||
evaluation has proved that a minimum ECN-marking threshold (minTh) | evaluation has proved that a minimum ECN-marking threshold (minTh) | |||
skipping to change at page 58, line 45 ¶ | skipping to change at page 57, line 37 ¶ | |||
13: continue % continue to the top of the while loop | 13: continue % continue to the top of the while loop | |||
14: } | 14: } | |||
15: mark(pkt) | 15: mark(pkt) | |||
16: } | 16: } | |||
17: } | 17: } | |||
18: return(pkt) % return the packet and stop here | 18: return(pkt) % return the packet and stop here | |||
19: } | 19: } | |||
20: return(NULL) % no packet to dequeue | 20: return(NULL) % no packet to dequeue | |||
21: } | 21: } | |||
Figure 11: Optimised Example Dequeue Pseudocode for DualQ Coupled | Figure 11: Optimised Example Dequeue Pseudocode for DualQ Coupled AQM | |||
AQM using Integer Arithmetic | using Integer Arithmetic | |||
The two ranges, range_L and range_C are expressed as powers of 2 so | The two ranges, range_L and range_C are expressed as powers of 2 so | |||
that division can be implemented as a right bit-shift (>>) in lines 5 | that division can be implemented as a right bit-shift (>>) in lines 5 | |||
and 10 of the integer variant of the pseudocode (Figure 11). | and 10 of the integer variant of the pseudocode (Figure 11). | |||
For the integer variant of the pseudocode, an integer version of the | For the integer variant of the pseudocode, an integer version of the | |||
rand() function used at line 25 of the maxrand(function) in Figure 10 | rand() function used at line 25 of the maxrand(function) in Figure 10 | |||
would be arranged to return an integer in the range 0 <= maxrand() < | would be arranged to return an integer in the range 0 <= maxrand() < | |||
2^32 (not shown). This would scale up all the floating point | 2^32 (not shown). This would scale up all the floating point | |||
probabilities in the range [0,1] by 2^32. | probabilities in the range [0,1] by 2^32. | |||
skipping to change at page 60, line 38 ¶ | skipping to change at page 59, line 34 ¶ | |||
operator needs to decide at what base RTT it wants L4S and Classic | operator needs to decide at what base RTT it wants L4S and Classic | |||
flows to have roughly equal throughput, once the effect of the | flows to have roughly equal throughput, once the effect of the | |||
additional Classic queue on Classic throughput has been taken into | additional Classic queue on Classic throughput has been taken into | |||
account. With this approach, a network operator can determine a good | account. With this approach, a network operator can determine a good | |||
coupling factor without knowing the precise L4S algorithm for | coupling factor without knowing the precise L4S algorithm for | |||
reducing RTT-dependence - or even in the absence of any algorithm. | reducing RTT-dependence - or even in the absence of any algorithm. | |||
The following additional terminology will be used, with appropriate | The following additional terminology will be used, with appropriate | |||
subscripts: | subscripts: | |||
r: Packet rate [pkt/s] | r: Packet rate [pkt/s] | |||
R: RTT [s/round] | R: RTT [s/round] | |||
p: ECN marking probability [] | p: ECN marking probability [] | |||
On the Classic side, we consider Reno as the most sensitive and | On the Classic side, we consider Reno as the most sensitive and | |||
therefore worst-case Classic congestion control. We will also | therefore worst-case Classic congestion control. We will also | |||
consider Cubic in its Reno-friendly mode ('CReno'), as the most | consider Cubic in its Reno-friendly mode ('CReno'), as the most | |||
prevalent congestion control, according to the references and | prevalent congestion control, according to the references and | |||
analysis in [PI2param]. In either case, the Classic packet rate in | analysis in [PI2param]. In either case, the Classic packet rate in | |||
steady state is given by the well-known square root formula for Reno | steady state is given by the well-known square root formula for Reno | |||
congestion control: | congestion control: | |||
r_C = 1.22 / (R_C * p_C^0.5) (5) | r_C = 1.22 / (R_C * p_C^0.5) (5) | |||
skipping to change at page 65, line 7 ¶ | skipping to change at page 64, line 7 ¶ | |||
Henderson <tomh@tomh.org> updated that earlier model and created a | Henderson <tomh@tomh.org> updated that earlier model and created a | |||
model for the DualQ variant specified as part of the Low Latency | model for the DualQ variant specified as part of the Low Latency | |||
DOCSIS specification, as well as conducting extensive evaluations. | DOCSIS specification, as well as conducting extensive evaluations. | |||
Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data | Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data | |||
Centre to the Home broadband testbed on which DualQ Coupled AQM | Centre to the Home broadband testbed on which DualQ Coupled AQM | |||
implementations were tested. | implementations were tested. | |||
Authors' Addresses | Authors' Addresses | |||
Koen De Schepper | Koen De Schepper | |||
Nokia Bell Labs | Nokia Bell Labs | |||
Antwerp | Antwerp | |||
Belgium | Belgium | |||
Email: koen.de_schepper@nokia.com | ||||
URI: https://www.bell-labs.com/about/researcher-profiles/ | Email: koen.de_schepper@nokia.com | |||
koende_schepper/ | URI: https://www.bell-labs.com/about/researcher-profiles/koende_schepper/ | |||
Bob Briscoe (editor) | Bob Briscoe (editor) | |||
Independent | Independent | |||
United Kingdom | UK | |||
Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
URI: https://bobbriscoe.net/ | URI: https://bobbriscoe.net/ | |||
Greg White | Greg White | |||
CableLabs | CableLabs | |||
Louisville, CO, | Louisville, CO | |||
United States of America | US | |||
Email: G.White@CableLabs.com | Email: G.White@CableLabs.com | |||
End of changes. 125 change blocks. | ||||
201 lines changed or deleted | 192 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |