[ih] Question on Flow Control

Wed Dec 31 08:20:25 PST 2025

Totally agree.

> On Dec 31, 2025, at 11:18, Vint Cerf via Internet-history <internet-history at elists.isoc.org> wrote:
> 
> That's a very crisp summary, Steve. Thanks!
> 
> V
> 
> Please send any postal/overnight deliveries to:
> Vint Cerf
> Google, LLC
> 1900 Reston Metro Plaza, 16th Floor
> Reston, VA 20190
> +1 (571) 213 1346
> 
> 
> until further notice
> 
> 
> 
> 
> On Wed, Dec 31, 2025, 11:01 Steve Crocker via Internet-history <
> internet-history at elists.isoc.org> wrote:
> 
>> Len,
>> 
>> Thanks for mentioning me.  In the design of the Arpanet protocols flow
>> control was indeed a major concern.  However, there were some key
>> differences between designing flow control for the Arpanet and flow control
>> for the Internet.
>> 
>> The initial version of the Arpanet was designed, implemented and deployed
>> with the conviction that no messages would ever be lost.  Hence there was
>> no reason to include retransmission in the Host-Host protocol.  (For those
>> not familiar with the original nomenclature, I used Host-Host protocol as
>> the name of the abstract bitstream.  Telnet and FTP were built on top of
>> it.   I used the term Network Control Program to refer to the software that
>> had to be added to each time-shared computer's operating system to support
>> interactions between user process and the IMP.  Over time, the abbreviation
>> "NCP" became repurposed to mean Network Control Protocol as the name of the
>> Host-Host protocol.)
>> 
>> Even though we didn't expect the Arpanet to drop messages, we anticipated
>> there might be congestion in the receiving host, and thus we needed a way
>> for the receiving host to have some control over the quantity or rate of
>> data the sending host.  The resulting design, allocations of both messages
>> and bits by the receiving host reflected a best guess.  We left it to the
>> implementers, operators and future researchers to work out quantitative
>> details.  (N.B. I said "bits."  Eight bit bytes were not yet the universal
>> quantity of exchange.  This changed by the time the Internet protocols were
>> being designed.)
>> 
>> Thus, when the Internet protocols were being designed there were two
>> significant differences.  First, it was clear there had to be a way to
>> retransmit messages that had been lost.  Second, the community had gained
>> some experience with the performance of the protocol.  And, of course, with
>> the Arpanet in operation, it was possible to try out different designs.
>> Retransmission strategies added a lot of complexity to the design problem.
>> But even the "simple" problem of controlling congestion without considering
>> lost messages was surprisingly complex.  In the early days, memory was very
>> limited.  When memory became plentiful, allocating too much space brought
>> forth the phenomenon of bufferbloat.
>> 
>> Returning to the relationship between the early work on flow control in the
>> Arpanet NCP and the later work on flow control and retransmission in the
>> Internet, I'd say the main contribution from the Arpanet initial period was
>> the identification of the need for flow control and an initial design that
>> provided a basis for measurement and experimentation.
>> 
>> Steve
>> 
>> 
>> 
>> 
>> On Mon, Dec 29, 2025 at 11:40 PM Leonard Kleinrock via Internet-history <
>> internet-history at elists.isoc.org> wrote:
>> 
>>> As this discussion group has been reaching back in time to the early
>>> RFC’s, the early Host-Host protocol, the NCP, other early protocols and
>>> rathole history, and how they informed TCP and its many improvements,
>> let’s
>>> not forget that Steve Crocker was a key contributor (e.g, RFC 1 and much
>>> more).  I may have missed mention of Steve, but surely we should be
>>> including his name in our discussions  about those early protocol and
>>> system developers.
>>> 
>>> Len
>>> 
>>> 
>>> 
>>>> On Dec 29, 2025, at 4:05 PM, Jack Haverty via Internet-history <
>>> internet-history at elists.isoc.org> wrote:
>>>> 
>>>> A little more rathole history...
>>>> 
>>>> In 1977/78, I implemented TCPV2 for Unix on a PDP-11/40.  It was based
>>> on the TCPV2 code which Jim Mathis at SRI had already created for the
>>> LSI-11.   So most of the "state diagram", buffer management, and datagram
>>> handling were compatible with a PDP-11, with a lot of shoehorning to get
>> it
>>> into the Unix environment (not easy on an 11/40).
>>>> 
>>>> Jim's code set the Retransmission timer at 3 seconds.  When I asked
>> why,
>>> the answers revealed that no one really knew what it should be.  It also
>>> didn't matter much at the time, since the underlying ARPANET which
>> carried
>>> most traffic delivered everything sent, in order, intact, and with no
>>> duplicates.  Gateways might drop datagrams, and did -- especially the
>> ones
>>> interconnecting ARPANET to SATNET for intercontinental traffic.
>>>> 
>>>> SATNET involved a geosynchronous satellite, with delays of perhaps a
>>> good fraction of a second even under no load.  So 3 seconds seemed
>>> reasonable for RTO.   I left the RTO in my Unix implementation set to 3
>>> seconds.   We also closely monitored the "core gateways" to detect
>>> situations with high loss rates of datagrams; gateways had no choice but
>>> discarding packets when no buffers were available.  It happened a lot in
>>> the intercontinental path.
>>>> 
>>>> A lot of us TCPV2 implementers just picked 3 seconds, while waiting for
>>> further research to produce a better answer.  Subsequently VanJ and
>> others
>>> thought about it a lot and invented schemes for adjusting TCP behavior,
>>> documented in numerous RFCs.
>>>> 
>>>> ...
>>>> 
>>>> More than a decade later, I was involved in operating a corporate
>>> network, using TCPV4 and 100+ Cisco routers.  We used SNMP to monitor the
>>> network behavior.  Since we were also responsible for many of the "host"
>>> computers, we also monitored TCP behavior in the hosts, also by using
>>> SNMP.  Not all TCP implementations implemented that capability, but for
>>> some we could watch retransmissions, duplicates, checksum errors, and
>>> collect such data from inside the hosts' TCPs.
>>>> 
>>>> It became obvious that there was a wide range of implementation
>>> decisions that the various TCP implementers had made.  At one point,
>> before
>>> Microsoft embraced TCP, there were more than 30 separate TCP
>>> implementations available just for use in PCs.  All sorts of companies
>> were
>>> also marketing workstations to attach to the proliferating Ethernets.
>>>> 
>>>> We had to test our own software with each of these.   They exhibited
>>> quite varied behavior.  Some were optimized for fastest network transfers
>>> -- including one that accomplished that by violating part of the Ethernet
>>> specifications for timing, effectively stealing service from others on
>> the
>>> LAN.  Others were optimized for minimizing load on the PC, either CPU or
>>> memory resources or both. Some were optimized for simplicity -- I recall
>>> one which only accepted the "next" datagram for its current window,
>>> discarding anything else.  It was simple and took advantage of the fact
>>> that out-of-order datagrams it discarded would be retransmitted anyway.
>>>> 
>>>> All of these implementations "worked", in the sense that TCP traffic
>>> would flow.  We could observe their behavior by monitoring both the
>>> gateways (called routers by that time) and the TCPs in computers attached
>>> to our intranet.
>>>> 
>>>> Whether or not they were "legal" and conformed to the specifications
>> and
>>> standards was unclear.   Marketing literature might say lots of things,
>> but
>>> independent certification labs were scarce or non-existent.   Caveat
>> emptor.
>>>> 
>>>> ...
>>>> 
>>>> Fast forward to today.  My home LAN now has 50+ devices on it.   All of
>>> them presumably have implemented TCP.  I don't watch any of them.  I have
>>> no idea which algorithms, RFCs, standards, or optimizations each has
>> chosen
>>> to implement.  Or if their implementation is correct.  Or "legal" in
>>> conforming to whatever the specifications are today.
>>>> 
>>>> Does anybody monitor the behavior of the Internet today at the host
>>> computers and their TCPs?   How does anyone know that the TCP in their
>>> device today is operating as expected and as the mathematical analyses
>>> promised?
>>>> 
>>>> /Jack Haverty
>>>> 
>>>> 
>>>> On 12/29/25 13:23, John Day via Internet-history wrote:
>>>>> 
>>>>>> On Dec 29, 2025, at 12:57, Craig Partridge <craig at tereschau.net>
>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Dec 29, 2025 at 12:07 PM John Day <jeanjour at comcast.net
>>> <mailto:jeanjour at comcast.net>> wrote:
>>>>>>> As for TCP initially using Selective-repeat or SACK, do you remember
>>> what the TCP retransmission time out was at that time? It makes a
>>> difference.  The nominal value in the textbooks is RTT + 4D, where D is
>> the
>>> mean variation. There is an RFC that says if 4D < 1 sec, set it to 1 sec.
>>> which seems high, but that is what it says.
>>>>>>> 
>>>>>>> Take care,
>>>>>>> John
>>>>>> Serious study of what the RTO should be didn't happen until the late
>>> 1980s.  Before that, it was rather ad hoc.
>>>>> I only brought up RTO because of the comment about SACK. For SACK to
>> be
>>> useful, RTO can’t be too short. 3 seconds sounds like plenty of time.
>>>>> 
>>>>>> RFC 793 says min(upper bound,  beta * min(lower bound, SRTT)). where
>>> SRTT was an incremental moving average, SRTT = (alpha * SRTT) +
>>> (1-alpha)(measured RTT).  But this leaves open all sorts of questions
>> such
>>> as: what should alpha and beta be (RFC 793 suggests alpha of .8 or so and
>>> beta of 1.3 to 2), and do you measure an RTT once per window (BSD's
>>> approach) or once per segment (I think TENEX's approach).  Not to mention
>>> the retransmission ambiguity problem, which Lixia Z. and Raj Jain
>>> discovered in 1985-6.  \\
>>>>> Yes, this is pretty much what the textbooks say these days. Although
>>> RFC 6298 has an equation for calculating RTO, the RFC says that if
>> equation
>>> yields a value less than 1 sec, then set it to 1 sec. It also says that
>> the
>>> previous value was 3 sec and there is no problem continuing to use that.
>>> So it would seem RTO should be between 1 and 3 seconds.  This seems to
>> be a
>>> long time.
>>>>> 
>>>>>> (If you are wondering why we didn't use variance -- it required a
>>> square root which was strictly a no-no in kernels of that era;  Van J.
>>> solved part of this issue by finding a variance calculation that could be
>>> done without a square root).
>>>>> Yes, it was clear why variance wasn’t used. It required by both
>> squares
>>> and square root. I tell students that in Operating Systems,
>> multiplication
>>> is higher math. ;-)
>>>>> 
>>>>>> This is an improvement on TCP v2 (which is silent on the topic) and
>>> IEN 15 (1976) which says use 2 * RTT estimate.
>>>>> For RTO? Yea, that would something to start with.
>>>>>> Ethernet and ALOHA were more explicit about this process but both had
>>> far easier problems, with well bounded prop delay (and in ALOHA's case, a
>>> prop delay so long it swamped queueing times).
>>>>>> 
>>>>>> Part of the reason TCP was slow to realize the issues, I think, were
>>> (1) the expectation loss would be low (Dave Clark used to say that in the
>>> 1970s, the notion was loss was below 1%, which, in a time when windows
>> were
>>> often 4, mean the RTO was used about 4% of the time); and (2) failure to
>>> realize congestion collapse was an issue (when loss rates soar to 80% or
>>> more and your RTO estimator really needs to be good or you make
>> congestion
>>> worse).  It is not chance that RTO issues came to a head as the Internet
>>> was suffering congestion collapse.  I got pulled into the issues (and
>>> helped Phil Karn solve retransmission ambiguity) because I was playing
>> with
>>> RDP, which had selective acks, and was seeing also sorts of strange holes
>>> in my windows (as out of order segments were being acked) and trying to
>>> figure out what to retransmit and when.
>>>>> It doesn’t help that the Internet adopted what is basically CUTE+AIMD.
>>>>> 
>>>>> But back to the flow control issue. This is a digression on a rat
>> hole.
>>> ;-)
>>>>> 
>>>>> But also a useful discussion.  ;-)
>>>>> 
>>>>> The question remains was dynamic window an enhancement of static
>> window
>>> or were they independently developed?
>>>>> 
>>>>> Take care,
>>>>> John
>>>>>> Craig
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> *****
>>>>>> Craig Partridge's email account for professional society activities
>>> and mailing lists.
>>>> 
>>>> --
>>>> Internet-history mailing list
>>>> Internet-history at elists.isoc.org
>>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>>> -
>>>> Unsubscribe:
>>> 
>> https://app.smartsheet.com/b/form/9b6ef0621638436ab0a9b23cb0668b0b?The%20list%20to%20be%20unsubscribed%20from=Internet-history
>>> 
>>> --
>>> Internet-history mailing list
>>> Internet-history at elists.isoc.org
>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>> -
>>> Unsubscribe:
>>> 
>> https://app.smartsheet.com/b/form/9b6ef0621638436ab0a9b23cb0668b0b?The%20list%20to%20be%20unsubscribed%20from=Internet-history
>>> 
>> 
>> 
>> --
>> Sent by a Verified
>> 
>> sender
>> --
>> Internet-history mailing list
>> Internet-history at elists.isoc.org
>> https://elists.isoc.org/mailman/listinfo/internet-history
>> -
>> Unsubscribe:
>> https://app.smartsheet.com/b/form/9b6ef0621638436ab0a9b23cb0668b0b?The%20list%20to%20be%20unsubscribed%20from=Internet-history
>> 
> -- 
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history
> -
> Unsubscribe: https://app.smartsheet.com/b/form/9b6ef0621638436ab0a9b23cb0668b0b?The%20list%20to%20be%20unsubscribed%20from=Internet-history