[ih] bufferbloat and modern congestion control (was 4004)

Wed Oct 2 17:08:41 PDT 2024

Re: Source Quench...

It's been 40+ years, but I remember meetings where Source Quench was 
first discussed.  My reaction was that it was too simplistic and 
wouldn't be effective.   At the time, I was the programmer responsible 
for the Unix TCP I had written for the PDP-11/40.  When I asked what a 
TCP should do when it received a SQ, no one could provide much of an 
answer.  If the initial datagram you sent out to open a TCP connection 
resulted in an incoming SQ, exactly how would you "slow down" that 
connection flow??

Other implementors had different ideas about how to handle an incoming 
SQ.  One (Dave Mills IIRC) opined that receiving an SQ meant that a 
gateway somewhere in the path had discarded the datagram you had sent.   
So the obvious response by the TCP should be to simply retransmit the 
datagram without waiting for any "retransmission timer" to fire.   You 
knew it had been discarded, so you should retransmit it immediately.

In my TCP, I think I just incremented a counter when I received a SQ.  
Could always change it later....

At the time, there had been a decade's worth of experience in running 
the Arpanet, and "congestion control" was a well-known, if not 
well-understood, issue.  There's a bunch of old reports available in 
DTIC that captured a lot of the analysis and experimentation that was 
done on the Arpanet to change its inner working as issues wee identified 
during operations - see, for example, DTIC reports accessible as 
ADA086338, and ADA086340.  There are many others describing the Arpanet 
experience.  In particular ADA121350 contains discussions of topics such 
as "Congestion Control" and "Issues in Internet Gateway Design".

There were internal mechanisms within the Arpanet that enabled it to 
provide a "virtual circuit" service to host computers attached to IMPs.  
Although individual packets were routed and handled separately, they 
were "reassembled" at the destination IMP before delivering them to the 
attached computer.   The Arpanet was widely characterized as a "packet 
network", but it had elaborate internal mechanisms to deliver a virtual 
circuit service to the computers it served.

Essentially, packets in the Arpanet didn't start travelling toward their 
destination until the destination confirmed that there was buffer space 
reserved for them.  Internal messages were exchanged to manage buffer 
allocations - e.g., the "ALLO" message (ALLOcate) was used to reserve 
space at a destination IMP.  Packets would then traverse each circuit 
between pairs of IMPs, with error-checking and retransmission as needed 
to keep it intact in its travels.  "RFNM" messages were used to 
indicate, to the sending host computer, that it was OK to send more data.

The ultimate flow control was available to the IMP as a hardware 
capability, which could simply stop the clock that controlled the flow 
of data between Hosts and IMPs.  That would effectively block all 
communications from the blocked host to anywhere else on the Arpanet.  
By "counting RFNMs", a host could avoid such drastic flow control by not 
sending any data that would violate the RFNM counter.  Any TCP or 
gateway implementation attached to the Arpanet was subject to such 
control, and had to implement RFNM counting to avoid it.   I have 
wondered how many implementations actually did.

All of these mechanisms were well-documented in the technical reports, 
often in excruciating (and likely boring) detail.    The ancient IMP 
code itself is even available online today.   As always, the ultimate 
documentation is the code itself.   But it's written in assembly 
language, and used every programming trick imaginable to make it fast, 
efficient, and functional in the minicomputer technology of the 1960s.  
It's not easy to figure out how it worked.

The IMPs had the hardware necessary to measure time, so routing was 
based on finding lowest delay routes.  In the earliest gateways, 
"getting a timestamp" from the processor wasn't hard.  It was 
impossible.  The gateway hardware simply didn't have any way to measure 
time.

IMPs had clocks, and were interconnected by circuits, so the IMPs could 
"loop back" any circuit and measure the time to send data and get it 
back.  They could calculate the delay along a route.

Gateways were interconnected by networks, which were much less stable 
and variable than a terrestrial or satellite circuit.   So Gateway 
routing was based on "hops" rather than time - as an interim mechanism 
until a time-based approach was available.   That would then enable 
handling datagrams which needed "low latency" TOS by sending them on a 
low-delay route.

Based on what I knew about the Arpanet, gleaned by osmosis from the 
activity at the Arpanet NOC down the hall and the Arpanet Group around 
the corner, I didn't think the Source Quench mechanism would work in the 
Internet.  But it also made a good place-holder, to be replaced someday 
when the research community figured out what mechanism would actually 
work for congestion control.

Much of what I knew about the internal structure of the Arpanet was 
available, but I think it's likely that few of the Internet researchers 
ever even saw the Arpanet reports.  The reports were sent to DoD and 
ARPA, but AFAIK never released as IENs or RFCs, or otherwise distributed 
within the "research community".

In addition, there was a prevailing policy from ARPA to avoid using old 
ideas and prefer trying new concepts.    I recall being told by someone 
at ARPA that they needed to promote trying new ideas rather than 
replicating old ones.  If you don't have enough failures, you're not 
following the "Advanced" part of the ARPA name.

Hope this helps explain how we got from there to here...
Jack Haverty

On 10/2/24 15:21, Dave Taht via Internet-history wrote:
> I wish I had had the time and resources to (help) write more papers. (For
> example there isn't much on "drop head queueing")
>
> fq_codel is now a linux-wide default and has the following unique
> properties:
>
> codel queue management, which measure the time a packet spends in a queue
> and gradually attempts to find an optimum point for queue length, which is
> 5ms by default. (it has been tested in software below 250us in the DC).
> There is another subsystem, called BQL, which attempts to limit bytes on
> the device txring to one interrupt's worth. (a pretty good explanation of
> modern layers here) [2]
>
> It drops from the head, not the tail of the queue, with a small (BQL or
> HTB) FIFO in front of the lowest bits of the hardware to account
> for interrupt latency.
>
> (I am kind of curious if a txring existed back in the day and how close an
> application sat to the hardware)
>
> Anecdote: when van and kathy were working on what became codel (january
> 2012), she rang me up one day and asked me just how much overhead there was
> in getting a timestamp from the hardware nowadays. And I explained that it
> was only a few cycles and a pipeline bubble, and the cost of unsynced TSQs
> and so on and so forth, and she said thanks, and hung up. Getting a
> timestamp must have been mighty hard back in the day!
>
> The "flow queueing" mechanism sends packets that have an arrival rate of
> less than the departure rate of all the other flows, out first.[1] This is
> an improvement over prior FQ mechanisms like SFQ and DRR, which always put
> a new flow at the tail of the flow list. It is pretty amazing how often
> this works on real traffic. Also it automatically puts flows that build a
> queue into a queue that is managed by codel.
>
> One (eventual) benefit of these approaches, combined, is it makes delay
> based congestion control more feasible (indeed,
> BBR spends most of its time in this mode), but the flow isolation makes for
> most interactive traffic never being queued at all.
>
> IMHO the edges of the internet at least, would have been much better were
> some form of FQ always in it (which we kind of got from switched networks
> naturally) but the idea of FQ was roundly rejected in the first ietf
> meeting in 1989, and it's been uphill ever since.
>
> Just to touch upon pacing a bit - pacing is the default for the linux stack
> no matter the overlying qdisc or congestion control algorithm.
> I don't know if anyone has ever attempted to compare pacing w/cubic vs
> pacing w/bbr, and very few, until recently, have
> attempted to also compare the cc-of-the-day vs fq_codel or cake. [3]
>
> [1]https://ieeexplore.ieee.org/document/8469111
> [2]https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9541151
> [3]
> https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0304609&type=printable
>
> Varying the packet pacing to get a pre-congestion notification is a paper
> I'd like more to pursue.
> https://www.usenix.org/system/files/atc24-han.pdf
> (I so want to believe this paper)
>
> A tiny bit more below....
>
> On Wed, Oct 2, 2024 at 2:31 PM John Day via Internet-history <
> internet-history at elists.isoc.org> wrote:
>
>> The response to bufferbloat has always struck me as looking for your keys
>> under a street light when that wasn’t where you dropped them but there is
>> light there.
>>
>> Initially, bufferbloat was not a problem because memory was expensive and
>> when TCP ran out of buffers (or got low), the connection simply blocked the
>> sending application until buffers were available. This was still true with
>> the advent of NIC cards. Memory was still tight. However, as memory got
>> cheap and NIC cards had oceans of memory, TCP never got low on buffers and
>> no one told the application to slow down or wait, so there was local
>> congestion collapse:  bufferbloat.
>>
>> One part of the solution would be interface flow control between the
>> sending application and TCP (you would have thought that would have
>> occurred to implementers any way, it is obvious) and/or simply restrict the
>> amount of buffers TCP has available so that it runs out and blocks the
>> sending the application before things get bad and opens up when buffers are
>> available.  But virtually all of the papers I see are on different
>> drop-strategies, and oddly enough they never find their keys.
>>
> don't have a lot of time for papers!  The most modern stuff for tcp is
> using EDF (earliest deadline first) to manage the packet pacing.
> There are virtual and actual physical devices nowadays that take a "time to
> be sent" and packet. This paper was highly influential:
>
> https://saeed.github.io/files/carousel-sigcomm17.pdf
>
> the latest commit to the linux kernel about it:
>
> https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/
>
> PS IMHO eric dumazet belongs a spot in the internet hall of fame for so
> many things...
>
>
>> Take care,
>> John
>>
>>> On Oct 2, 2024, at 01:48, Barbara Denny via Internet-history <
>> internet-history at elists.isoc.org> wrote:
>>> Just throwing some thoughts out here ......
>>> I can see how this happens in a FIFO queuing world.   However a lot of
>> work has gone into fair queuing starting in the late 80s.  Just wondering
>> if anyone has done work utilizing fair queuing and source quench.  For
>> example, I think I can see how to use fair queuing information to better
>> select who to send a source quench to. At least I can see how to do it with
>> Stochastic Fairness Queueing since I worked on it and I  remember a fair
>> amount about how it was implemented. I wouldn't be able to provide a
>> guarantee that the wrong host would never receive a source quench but the
>> likelihood should be much lower.  Considering whether the use of NAT
>> creates undesirable behavior is also important and I am sure there are
>> probably other cases that need to be checked.
>>> Hum,  it might also be interesting to speculate whether this could have
>> any effect on bufferbloat but I fess up I need to learn more about the work
>> done in the area of bufferbloat.  I was involved with other things when
>> this started to appear on my radar screen as a hot topic.  I will admit I
>> wish I had done more work on possible buffering effects from implementation
>> choices at the time I did work on SFQ but there were contractual
>> obligations that restricted how much time I could devote to the SFQ part of
>> the project.
>>> Just curious, ECN (Explicit Congestion Notification) is optional . Does
>> anyone have any idea about its use in the Internet?
>>> barbara
>>>
>>>     On Tuesday, October 1, 2024 at 07:10:25 PM PDT, Vint Cerf <
>> vint at google.com> wrote:
>>> One basic problem with blaming the "last packet that caused intermediate
>> router congestion" is that it usually blamed the wrong source, among other
>> problems. Van Jacobson was/is the guru of flow control (among others) who
>> might remember more.
>>> v
>>>
>>> On Tue, Oct 1, 2024 at 8:50 PM Barbara Denny via Internet-history <
>> internet-history at elists.isoc.org> wrote:
>>>   In a brief attempt to try to find some information about the early MIT
>> work you mentioned, I ended up tripping on this Final Report from ISI in
>> DTIC.  It does talk a fair amount about congestion control and source
>> quench (plus other things that might interest people). The period of
>> performance is 1987 to 1990 which is much later than I was considering in
>> my earlier message.
>>> https://apps.dtic.mil/sti/tr/pdf/ADA236542.pdf
>>> Even though the report mentions testing on DARTnet, I don't remember
>> anything about this during our DARTnet meetings.  I did join the project
>> after the start so perhaps the work was done before I began to participate.
>> I also couldn't easily find the journal they mention as a place for
>> publishing their findings. I will have more time later to see if I can
>> something that covers this testing.
>>> barbara
>>>
>>>      On Tuesday, October 1, 2024 at 04:37:47 PM PDT, Scott Bradner via
>> Internet-history<internet-history at elists.isoc.org>  wrote:
>>>   multicast is also an issue but I do not recall if that was one that
>> Craig & I talked about
>>> Scott
>>>
>>>> On Oct 1, 2024, at 7:34 PM, Scott Bradner via Internet-history <
>> internet-history at elists.isoc.org> wrote:
>>>> I remember talking with Craig Partridge (on a flight to somewhere)
>> about source quench
>>>> during the time when 1812 was being written - I do not recall
>>>> the specific issues but I recall that there were more than one issue
>>>>
>>>> (if DoS was not an issue at the time, it should have been)
>>>>
>>>> Scott
>>>>
>>>>> On Oct 1, 2024, at 6:22 PM, Brian E Carpenter via Internet-history <
>> internet-history at elists.isoc.org> wrote:
>>>>> On 02-Oct-24 10:19, Michael Greenwald via Internet-history wrote:
>>>>>> On 10/1/24 1:11 PM, Greg Skinner via Internet-history wrote:
>>>>>>> Forwarded for Barbara
>>>>>>>
>>>>>>> ====
>>>>>>>
>>>>>>> From: Barbara Denny<b_a_denny at yahoo.com>
>>>>>>> Sent: Tuesday, October 1, 2024 at 10:26:16 AM PDT
>>>>>>> I think congestion issues were discussed because I remember an ICMP
>> message type called source quench (now deprecated). It was used for
>> notifying a host to reduce the traffic load to a destination.  I don't
>> remember hearing about any actual congestion experiments using this message
>> type.
>>>>>> Of only academic interest: I believe that, circa 1980 +/- 1-2 years,
>> an
>>>>>> advisee of either Dave Clark or Jerry Saltzer, wrote an undergraduate
>>>>>> thesis about the use of Source Quench for congestion control. I
>> believe
>>>>>> it included some experiments (maybe all artificial, or only through
>>>>>> simulation).
>>>>>> I don't think it had much impact on the rest of the world.
>>>>> Source quench is discussed in detail in John Nagle's RFC 896 (dated
>> 1984).
>>>>> A trail of breadcrumbs tells me that he has an MSCS from Stanford, so
>>>>> I guess he probably wasn't an MIT undergrad.
>>>>>
>>>>> Source quench was effectively deprecated by RFC 1812 (dated 1995).
>> People
>>>>> had played around with ideas (e.g. RFC 1016) but it seems that
>> basically
>>>>> it was no use.
>>>>>
>>>>> A bit more Google found this, however:
>>>>>
>>>>> "4.3. Internet Congestion Control
>>>>> Lixia Zhang began a study of network resource allocation techniques
>> suitable for
>>>>> the DARPA Internet. The Internet currently has a simple technique for
>> resource
>>>>> allocation, called "Source Quench."
>>>>> Simple simulations have shown that this technique is not effective,
>> and this work
>>>>> has produced an alternative which seems considerably more workable.
>> Simulation
>>>>> of this new technique is now being performed."
>>>>>
>>>>> [MIT LCS Progress Report to DARPA, July 1983 - June 1984, AD-A158299,
>>>>> https://apps.dtic.mil/sti/pdfs/ADA158299.pdf
>> ]
>>>>> Lixia was then a grad student under Dave Clark. Of course she's at
>> UCLA now. If she isn't on this list, she should be!
>>>>>    Brian Carpenter
>>>
>>> --
>>> Internet-history mailing list
>>> Internet-history at elists.isoc.org
>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>>
>>>
>>>
>>> --
>>> Please send any postal/overnight deliveries to:Vint CerfGoogle, LLC1900
>> Reston Metro Plaza, 16th FloorReston, VA 20190+1 (571) 213 1346
>>>
>>> until further notice
>>>
>>>
>>>
>>> --
>>> Internet-history mailing list
>>> Internet-history at elists.isoc.org
>>> https://elists.isoc.org/mailman/listinfo/internet-history
>> --
>> Internet-history mailing list
>> Internet-history at elists.isoc.org
>> https://elists.isoc.org/mailman/listinfo/internet-history
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20241002/0d0b8f9b/attachment.asc>