[ih] bufferbloat and modern congestion control (was 4004)
John Day
jeanjour at comcast.net
Thu Oct 3 13:04:03 PDT 2024
Okay, thanks. That clarifies the RFNM issue.
What were ‘all of those applications in the 80s’ you were alluding to? Well, some of them. In the 80s, was the ARPANET becoming a smaller and smaller part of the Internet?
Your last comment: The ARPANET did process-to-process flow control in NCP and I would venture a guess, hop-by-hop flow control host-to-host through IMP-Host to IMP subnet to IMP-Host. I need to dig int the nature of the IMP-IMP flow and congestion control.
Thanks,
John
> On Oct 3, 2024, at 15:25, Jack Haverty <jack at 3kitty.org> wrote:
>
> John,
>
> RFNMs were messages sent from an IMP to one of its attached Host computers - literally "Request For Next Message". AFAIK, RFNMs were not sent internally between IMPs. But there was a lot of internal mechanism to exchange information between IMPs. An IMP could send a RFNM to a Host to indicate that it was OK for that Host to send more data.
>
> If your Host didn't obey the RFNM rules, as a last measure to protect the network, the IMP could shut off that Host by turning off the hardware clock on the cable connecting the Host and IMP. For a while, that Host wouldn't be able to talk to any other Host. But service would be restored when the traffic jam cleared.
>
> But I'm not the right person to ask about IMP internals. When I joined BBN, the Arpanet had already been running for 8 years and had transitioned into "operational" mode, with DCA rather than ARPA in charge. I was in the "research" area associated with then-current ARPA projects such as TCP. But the "IMP Guys" were close by. When I was implementing TCP for Unix, I suspect I learned about RFNMs from one of the IMP programmers, and in particular learned what my TCP had to do in order to avoid ever being blocked by the IMP.
>
> I'm not sure if other TCP implementers knew much about how the Arpanet worked, or what they should do in their TCP implementations. Such stuff was probably described somewhere in reports, but I had the advantage of being able to just go ask an IMP programmer. That was even preferable to looking at the ultimate documentation -- the code itself, which was not easy to understand.
>
> One of the features of the "Arpanet Architecture" was that the internal mechanisms were insulated from the world outside, which was defined by the "1822" interface specifications. So the internal mechanisms could be changed without any need for Host computers to modify their hardware or software. The Host-IMP interface did change sometimes, but very rarely, e.g,. to introduce "96-bit leaders". The internal IMP mechanisms could (and did) change with every release of the IMP software. They also changed with everyday "patches" that addressed some current operational problem.
>
> As the Arpanet grew during the 80s, lots of traffic, lots of new applications, and lots of new users surfaced a lot of issues. The Arpanet internal mechanisms were studied as they were in use, simulations were performed, analyses were done, and new mechanisms were implemented and carefully introduced into the active network, instrumented to see how well the theory matched the results in practice.
>
> Congestion control was one such issue. I recall others, e.g., "MultiPath Routing". This was surfaced by the observation that, at the time, there were 3 separate paths through the Arpanet mesh to get data from the East Coast to the West Coast. The "routing algorithm" always had an idea of the "best path", and sent all data along that route. Each route involved 56 kilobits/second circuits. But since all data flowed on the current "best route", it was not possible to attain more than 56 kb/s throughput between any two hosts, even though the cross-country capacity was available for more.
>
> Personally, I learned about these kinds of Arpanet issues mostly from proximity to the Arpanet NOC and IMP Guys. There were lots of reports documenting Arpanet behavior, but they may not have been readily available (no Web yet) or widely distributed, or even of much interest to the researchers pursuing Internet Projects.
>
> The DTIC documents I mentioned earlier are some of those reports that may be not only of historical interest, but also relate to current issues which exist in today's Internet. For example, "MultiPath Routing" is an issue in today's Internet. My cell phone has two paths available to it for using the Internet (Wifi and Cell). But it can only use one at any time. Flow Control was handled by the internal IMP mechanisms such as RFNMs. But it didn't prevent Congestion. Congestion Control was another hot topic back in the 1980s Arpanet.
>
> The "Arpanet Architecture" put mechanisms for congestion control, routing, flow control, instrumentation, et al as internal mechanisms. The "Internet Architecture" places some of those functions into the "switching" fabric of routers, switches, and modems. It places other functions into the "Host" devices where TCP et al are implemented.
> Both Hosts and Switching Fabric contain products developed by many different, and often competing, manufacturers.
>
> IMHO, those two architectures are quite different, yet reflect approaches to the same problem of building a distributed computing infrastructure (Licklider's "Galactic Network" vision).
>
> I don't recall much about the changes that were made to the Arpanet for things like congestion control. That's why I mentioned those reports saved by DTIC (Defense Technical Information Center). There may be some gems of experience still in those reports that might apply in today's world. I'm even listed as the Author of some of them; but that just reflects that, at the time, I was the designated manager of the associated contract. That doesn't mean that I knew anything about the work; I was just responsible for getting the report submitted so the customer would pay the bill.
>
> Andy Malis, who worked with IMPs, may remember more. Also Bob Hinden, who we recruited into the Internet world from the Arpanet group.
>
> Andy? Bob?
>
> Jack Haverty
>
>
> On 10/3/24 08:50, John Day wrote:
>> Jack,
>>
>> Good stuff. I agree and as I said before SQ alone is not sufficient unless the action to be taken is also defined, both when to send it and what to do when it arrives. Raj and KK said that ECN should be sent when the average queue length was greater than or equal to 1. This is very early and gives the senders time to back off before packets are dropped (hopefully) and retransmissions are generated. TCP by using implicit notification waits until the queue is full and packets are being dropped, and will continue to be dropped (it would seem) until the senders notice the lost Acks. This would appear to generate a lot of retransmissions.
>>
>> A question for you: It has been my impression that with the RFNMs between IMPs, congestion would not have occurred within the IMP subnet (or very rarely).* However, there would have been congestion at the gateways as you describe. Is that correct?
>>
>> Take care,
>> John
>>
>> * Early on there were some deadlock conditions caused by the fact that a message could be 8 packets and was reassembled in the IMP before being delivered to the host, but that isn’t congestion.
>>
>>> On Oct 2, 2024, at 20:08, Jack Haverty via Internet-history <internet-history at elists.isoc.org> <mailto:internet-history at elists.isoc.org> wrote:
>>>
>>> Re: Source Quench...
>>>
>>> It's been 40+ years, but I remember meetings where Source Quench was first discussed. My reaction was that it was too simplistic and wouldn't be effective. At the time, I was the programmer responsible for the Unix TCP I had written for the PDP-11/40. When I asked what a TCP should do when it received a SQ, no one could provide much of an answer. If the initial datagram you sent out to open a TCP connection resulted in an incoming SQ, exactly how would you "slow down" that connection flow??
>>>
>>> Other implementors had different ideas about how to handle an incoming SQ. One (Dave Mills IIRC) opined that receiving an SQ meant that a gateway somewhere in the path had discarded the datagram you had sent. So the obvious response by the TCP should be to simply retransmit the datagram without waiting for any "retransmission timer" to fire. You knew it had been discarded, so you should retransmit it immediately.
>>>
>>> In my TCP, I think I just incremented a counter when I received a SQ. Could always change it later....
>>>
>>> At the time, there had been a decade's worth of experience in running the Arpanet, and "congestion control" was a well-known, if not well-understood, issue. There's a bunch of old reports available in DTIC that captured a lot of the analysis and experimentation that was done on the Arpanet to change its inner working as issues wee identified during operations - see, for example, DTIC reports accessible as ADA086338, and ADA086340. There are many others describing the Arpanet experience. In particular ADA121350 contains discussions of topics such as "Congestion Control" and "Issues in Internet Gateway Design".
>>>
>>> There were internal mechanisms within the Arpanet that enabled it to provide a "virtual circuit" service to host computers attached to IMPs. Although individual packets were routed and handled separately, they were "reassembled" at the destination IMP before delivering them to the attached computer. The Arpanet was widely characterized as a "packet network", but it had elaborate internal mechanisms to deliver a virtual circuit service to the computers it served.
>>>
>>> Essentially, packets in the Arpanet didn't start travelling toward their destination until the destination confirmed that there was buffer space reserved for them. Internal messages were exchanged to manage buffer allocations - e.g., the "ALLO" message (ALLOcate) was used to reserve space at a destination IMP. Packets would then traverse each circuit between pairs of IMPs, with error-checking and retransmission as needed to keep it intact in its travels. "RFNM" messages were used to indicate, to the sending host computer, that it was OK to send more data.
>>>
>>> The ultimate flow control was available to the IMP as a hardware capability, which could simply stop the clock that controlled the flow of data between Hosts and IMPs. That would effectively block all communications from the blocked host to anywhere else on the Arpanet. By "counting RFNMs", a host could avoid such drastic flow control by not sending any data that would violate the RFNM counter. Any TCP or gateway implementation attached to the Arpanet was subject to such control, and had to implement RFNM counting to avoid it. I have wondered how many implementations actually did.
>>>
>>> All of these mechanisms were well-documented in the technical reports, often in excruciating (and likely boring) detail. The ancient IMP code itself is even available online today. As always, the ultimate documentation is the code itself. But it's written in assembly language, and used every programming trick imaginable to make it fast, efficient, and functional in the minicomputer technology of the 1960s. It's not easy to figure out how it worked.
>>>
>>> The IMPs had the hardware necessary to measure time, so routing was based on finding lowest delay routes. In the earliest gateways, "getting a timestamp" from the processor wasn't hard. It was impossible. The gateway hardware simply didn't have any way to measure time.
>>>
>>> IMPs had clocks, and were interconnected by circuits, so the IMPs could "loop back" any circuit and measure the time to send data and get it back. They could calculate the delay along a route.
>>>
>>> Gateways were interconnected by networks, which were much less stable and variable than a terrestrial or satellite circuit. So Gateway routing was based on "hops" rather than time - as an interim mechanism until a time-based approach was available. That would then enable handling datagrams which needed "low latency" TOS by sending them on a low-delay route.
>>>
>>> Based on what I knew about the Arpanet, gleaned by osmosis from the activity at the Arpanet NOC down the hall and the Arpanet Group around the corner, I didn't think the Source Quench mechanism would work in the Internet. But it also made a good place-holder, to be replaced someday when the research community figured out what mechanism would actually work for congestion control.
>>>
>>> Much of what I knew about the internal structure of the Arpanet was available, but I think it's likely that few of the Internet researchers ever even saw the Arpanet reports. The reports were sent to DoD and ARPA, but AFAIK never released as IENs or RFCs, or otherwise distributed within the "research community".
>>>
>>> In addition, there was a prevailing policy from ARPA to avoid using old ideas and prefer trying new concepts. I recall being told by someone at ARPA that they needed to promote trying new ideas rather than replicating old ones. If you don't have enough failures, you're not following the "Advanced" part of the ARPA name.
>>>
>>> Hope this helps explain how we got from there to here...
>>> Jack Haverty
>>>
>>>
>>>
>>>
>>>
>>> On 10/2/24 15:21, Dave Taht via Internet-history wrote:
>>>> I wish I had had the time and resources to (help) write more papers. (For
>>>> example there isn't much on "drop head queueing")
>>>>
>>>> fq_codel is now a linux-wide default and has the following unique
>>>> properties:
>>>>
>>>> codel queue management, which measure the time a packet spends in a queue
>>>> and gradually attempts to find an optimum point for queue length, which is
>>>> 5ms by default. (it has been tested in software below 250us in the DC).
>>>> There is another subsystem, called BQL, which attempts to limit bytes on
>>>> the device txring to one interrupt's worth. (a pretty good explanation of
>>>> modern layers here) [2]
>>>>
>>>> It drops from the head, not the tail of the queue, with a small (BQL or
>>>> HTB) FIFO in front of the lowest bits of the hardware to account
>>>> for interrupt latency.
>>>>
>>>> (I am kind of curious if a txring existed back in the day and how close an
>>>> application sat to the hardware)
>>>>
>>>> Anecdote: when van and kathy were working on what became codel (january
>>>> 2012), she rang me up one day and asked me just how much overhead there was
>>>> in getting a timestamp from the hardware nowadays. And I explained that it
>>>> was only a few cycles and a pipeline bubble, and the cost of unsynced TSQs
>>>> and so on and so forth, and she said thanks, and hung up. Getting a
>>>> timestamp must have been mighty hard back in the day!
>>>>
>>>> The "flow queueing" mechanism sends packets that have an arrival rate of
>>>> less than the departure rate of all the other flows, out first.[1] This is
>>>> an improvement over prior FQ mechanisms like SFQ and DRR, which always put
>>>> a new flow at the tail of the flow list. It is pretty amazing how often
>>>> this works on real traffic. Also it automatically puts flows that build a
>>>> queue into a queue that is managed by codel.
>>>>
>>>> One (eventual) benefit of these approaches, combined, is it makes delay
>>>> based congestion control more feasible (indeed,
>>>> BBR spends most of its time in this mode), but the flow isolation makes for
>>>> most interactive traffic never being queued at all.
>>>>
>>>> IMHO the edges of the internet at least, would have been much better were
>>>> some form of FQ always in it (which we kind of got from switched networks
>>>> naturally) but the idea of FQ was roundly rejected in the first ietf
>>>> meeting in 1989, and it's been uphill ever since.
>>>>
>>>> Just to touch upon pacing a bit - pacing is the default for the linux stack
>>>> no matter the overlying qdisc or congestion control algorithm.
>>>> I don't know if anyone has ever attempted to compare pacing w/cubic vs
>>>> pacing w/bbr, and very few, until recently, have
>>>> attempted to also compare the cc-of-the-day vs fq_codel or cake. [3]
>>>>
>>>> [1]https://ieeexplore.ieee.org/document/8469111
>>>> [2]https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9541151
>>>> [3]
>>>> https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0304609&type=printable
>>>>
>>>> Varying the packet pacing to get a pre-congestion notification is a paper
>>>> I'd like more to pursue.
>>>> https://www.usenix.org/system/files/atc24-han.pdf
>>>> (I so want to believe this paper)
>>>>
>>>> A tiny bit more below....
>>>>
>>>> On Wed, Oct 2, 2024 at 2:31 PM John Day via Internet-history <
>>>> internet-history at elists.isoc.org <mailto:internet-history at elists.isoc.org>> wrote:
>>>>
>>>>> The response to bufferbloat has always struck me as looking for your keys
>>>>> under a street light when that wasn’t where you dropped them but there is
>>>>> light there.
>>>>>
>>>>> Initially, bufferbloat was not a problem because memory was expensive and
>>>>> when TCP ran out of buffers (or got low), the connection simply blocked the
>>>>> sending application until buffers were available. This was still true with
>>>>> the advent of NIC cards. Memory was still tight. However, as memory got
>>>>> cheap and NIC cards had oceans of memory, TCP never got low on buffers and
>>>>> no one told the application to slow down or wait, so there was local
>>>>> congestion collapse: bufferbloat.
>>>>>
>>>>> One part of the solution would be interface flow control between the
>>>>> sending application and TCP (you would have thought that would have
>>>>> occurred to implementers any way, it is obvious) and/or simply restrict the
>>>>> amount of buffers TCP has available so that it runs out and blocks the
>>>>> sending the application before things get bad and opens up when buffers are
>>>>> available. But virtually all of the papers I see are on different
>>>>> drop-strategies, and oddly enough they never find their keys.
>>>>>
>>>> don't have a lot of time for papers! The most modern stuff for tcp is
>>>> using EDF (earliest deadline first) to manage the packet pacing.
>>>> There are virtual and actual physical devices nowadays that take a "time to
>>>> be sent" and packet. This paper was highly influential:
>>>>
>>>> https://saeed.github.io/files/carousel-sigcomm17.pdf
>>>>
>>>> the latest commit to the linux kernel about it:
>>>>
>>>> https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/
>>>>
>>>> PS IMHO eric dumazet belongs a spot in the internet hall of fame for so
>>>> many things...
>>>>
>>>>
>>>>> Take care,
>>>>> John
>>>>>
>>>>>> On Oct 2, 2024, at 01:48, Barbara Denny via Internet-history <
>>>>> internet-history at elists.isoc.org <mailto:internet-history at elists.isoc.org>> wrote:
>>>>>> Just throwing some thoughts out here ......
>>>>>> I can see how this happens in a FIFO queuing world. However a lot of
>>>>> work has gone into fair queuing starting in the late 80s. Just wondering
>>>>> if anyone has done work utilizing fair queuing and source quench. For
>>>>> example, I think I can see how to use fair queuing information to better
>>>>> select who to send a source quench to. At least I can see how to do it with
>>>>> Stochastic Fairness Queueing since I worked on it and I remember a fair
>>>>> amount about how it was implemented. I wouldn't be able to provide a
>>>>> guarantee that the wrong host would never receive a source quench but the
>>>>> likelihood should be much lower. Considering whether the use of NAT
>>>>> creates undesirable behavior is also important and I am sure there are
>>>>> probably other cases that need to be checked.
>>>>>> Hum, it might also be interesting to speculate whether this could have
>>>>> any effect on bufferbloat but I fess up I need to learn more about the work
>>>>> done in the area of bufferbloat. I was involved with other things when
>>>>> this started to appear on my radar screen as a hot topic. I will admit I
>>>>> wish I had done more work on possible buffering effects from implementation
>>>>> choices at the time I did work on SFQ but there were contractual
>>>>> obligations that restricted how much time I could devote to the SFQ part of
>>>>> the project.
>>>>>> Just curious, ECN (Explicit Congestion Notification) is optional . Does
>>>>> anyone have any idea about its use in the Internet?
>>>>>> barbara
>>>>>>
>>>>>> On Tuesday, October 1, 2024 at 07:10:25 PM PDT, Vint Cerf <
>>>>> vint at google.com <mailto:vint at google.com>> wrote:
>>>>>> One basic problem with blaming the "last packet that caused intermediate
>>>>> router congestion" is that it usually blamed the wrong source, among other
>>>>> problems. Van Jacobson was/is the guru of flow control (among others) who
>>>>> might remember more.
>>>>>> v
>>>>>>
>>>>>> On Tue, Oct 1, 2024 at 8:50 PM Barbara Denny via Internet-history <
>>>>> internet-history at elists.isoc.org <mailto:internet-history at elists.isoc.org>> wrote:
>>>>>> In a brief attempt to try to find some information about the early MIT
>>>>> work you mentioned, I ended up tripping on this Final Report from ISI in
>>>>> DTIC. It does talk a fair amount about congestion control and source
>>>>> quench (plus other things that might interest people). The period of
>>>>> performance is 1987 to 1990 which is much later than I was considering in
>>>>> my earlier message.
>>>>>> https://apps.dtic.mil/sti/tr/pdf/ADA236542.pdf
>>>>>> Even though the report mentions testing on DARTnet, I don't remember
>>>>> anything about this during our DARTnet meetings. I did join the project
>>>>> after the start so perhaps the work was done before I began to participate.
>>>>> I also couldn't easily find the journal they mention as a place for
>>>>> publishing their findings. I will have more time later to see if I can
>>>>> something that covers this testing.
>>>>>> barbara
>>>>>>
>>>>>> On Tuesday, October 1, 2024 at 04:37:47 PM PDT, Scott Bradner via
>>>>> Internet-history<internet-history at elists.isoc.org> <mailto:internet-history at elists.isoc.org> wrote:
>>>>>> multicast is also an issue but I do not recall if that was one that
>>>>> Craig & I talked about
>>>>>> Scott
>>>>>>
>>>>>>> On Oct 1, 2024, at 7:34 PM, Scott Bradner via Internet-history <
>>>>> internet-history at elists.isoc.org <mailto:internet-history at elists.isoc.org>> wrote:
>>>>>>> I remember talking with Craig Partridge (on a flight to somewhere)
>>>>> about source quench
>>>>>>> during the time when 1812 was being written - I do not recall
>>>>>>> the specific issues but I recall that there were more than one issue
>>>>>>>
>>>>>>> (if DoS was not an issue at the time, it should have been)
>>>>>>>
>>>>>>> Scott
>>>>>>>
>>>>>>>> On Oct 1, 2024, at 6:22 PM, Brian E Carpenter via Internet-history <
>>>>> internet-history at elists.isoc.org <mailto:internet-history at elists.isoc.org>> wrote:
>>>>>>>> On 02-Oct-24 10:19, Michael Greenwald via Internet-history wrote:
>>>>>>>>> On 10/1/24 1:11 PM, Greg Skinner via Internet-history wrote:
>>>>>>>>>> Forwarded for Barbara
>>>>>>>>>>
>>>>>>>>>> ====
>>>>>>>>>>
>>>>>>>>>> From: Barbara Denny<b_a_denny at yahoo.com> <mailto:b_a_denny at yahoo.com>
>>>>>>>>>> Sent: Tuesday, October 1, 2024 at 10:26:16 AM PDT
>>>>>>>>>> I think congestion issues were discussed because I remember an ICMP
>>>>> message type called source quench (now deprecated). It was used for
>>>>> notifying a host to reduce the traffic load to a destination. I don't
>>>>> remember hearing about any actual congestion experiments using this message
>>>>> type.
>>>>>>>>> Of only academic interest: I believe that, circa 1980 +/- 1-2 years,
>>>>> an
>>>>>>>>> advisee of either Dave Clark or Jerry Saltzer, wrote an undergraduate
>>>>>>>>> thesis about the use of Source Quench for congestion control. I
>>>>> believe
>>>>>>>>> it included some experiments (maybe all artificial, or only through
>>>>>>>>> simulation).
>>>>>>>>> I don't think it had much impact on the rest of the world.
>>>>>>>> Source quench is discussed in detail in John Nagle's RFC 896 (dated
>>>>> 1984).
>>>>>>>> A trail of breadcrumbs tells me that he has an MSCS from Stanford, so
>>>>>>>> I guess he probably wasn't an MIT undergrad.
>>>>>>>>
>>>>>>>> Source quench was effectively deprecated by RFC 1812 (dated 1995).
>>>>> People
>>>>>>>> had played around with ideas (e.g. RFC 1016) but it seems that
>>>>> basically
>>>>>>>> it was no use.
>>>>>>>>
>>>>>>>> A bit more Google found this, however:
>>>>>>>>
>>>>>>>> "4.3. Internet Congestion Control
>>>>>>>> Lixia Zhang began a study of network resource allocation techniques
>>>>> suitable for
>>>>>>>> the DARPA Internet. The Internet currently has a simple technique for
>>>>> resource
>>>>>>>> allocation, called "Source Quench."
>>>>>>>> Simple simulations have shown that this technique is not effective,
>>>>> and this work
>>>>>>>> has produced an alternative which seems considerably more workable.
>>>>> Simulation
>>>>>>>> of this new technique is now being performed."
>>>>>>>>
>>>>>>>> [MIT LCS Progress Report to DARPA, July 1983 - June 1984, AD-A158299,
>>>>>>>> https://apps.dtic.mil/sti/pdfs/ADA158299.pdf
>>>>> ]
>>>>>>>> Lixia was then a grad student under Dave Clark. Of course she's at
>>>>> UCLA now. If she isn't on this list, she should be!
>>>>>>>> Brian Carpenter
>>>>>> --
>>>>>> Internet-history mailing list
>>>>>> Internet-history at elists.isoc.org <mailto:Internet-history at elists.isoc.org>
>>>>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Please send any postal/overnight deliveries to:Vint CerfGoogle, LLC1900
>>>>> Reston Metro Plaza, 16th FloorReston, VA 20190+1 (571) 213 1346
>>>>>> until further notice
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Internet-history mailing list
>>>>>> Internet-history at elists.isoc.org <mailto:Internet-history at elists.isoc.org>
>>>>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>>>> --
>>>>> Internet-history mailing list
>>>>> Internet-history at elists.isoc.org <mailto:Internet-history at elists.isoc.org>
>>>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>>>>
>>> --
>>> Internet-history mailing list
>>> Internet-history at elists.isoc.org <mailto:Internet-history at elists.isoc.org>
>>> https://elists.isoc.org/mailman/listinfo/internet-history
>
> <OpenPGP_0x746CC322403B8E50.asc>
More information about the Internet-history
mailing list