[ih] TCP RTT Estimator
Jack Haverty
jack at 3kitty.org
Tue Mar 18 16:16:42 PDT 2025
Hi Len,
Thanks for the pointers. They fill in a bit more of the History. In
particular I've seen little written about the early days of SATNET,
AlohaNet, and such. Also, in those days ( 1970s+- ) there was no Web,
no Internet, no search engines, and no easy way to access such papers
except by attending the conferences.
I wasn't involved with SATNET in its early days. It came onto my radar
when Vint put "make the core gateways a 24x7 operational service" onto
an ARPA contract I was managing. I think it was fall 1978. By that
time, SATNET was running CPODA and was in "operation" mode, monitored by
the BBN NOC which also similarly managed the ARPANET. The technology
was pretty stable by then. MATNET had also been deployed, as a clone of
SATNET, with installations on Navy sites including the USS Carl
Vinson. It was the next step in the progression from research to
operational "technology transfer" into the "real world" of DoD.
From the papers you highlighted, it seems that the experiments were
carried out before the CPODA introduction. I'm a bit confused about
exactly what was involved. There was SATNET with sites in West
Virginia US and Goonhilly Downs UK. There was also an ARPANET IMP
(actually UCL-TIP IIRC) linked to IMPs in the US by satellite. I always
thought those were two separate networks, but maybe somehow the ARPANET
IMP-IMP "circuit" used the SATNET satellite channel? The paper
references RFNMs on SATNET. But I don't remember if those were part of
the SATNET mechanisms (CPODA?) or somehow part of the ARPANET internal
mechanisms. I don't recall ever hearing anything about RFNMs being part
of SATNET's mechanisms while I was responsible for it.
In any event, I studied quite a bit of queueing theory and other
branches of mathematics (e.g., statistics, operations research, etc.)
while a student at MIT. It was all very enlightening to understand how
things work, and to be able to use the techniques to compare possible
internal algorithms.
But I also learned that there can be large differences between theory
and practice.
One example was while I had a student job programming a PDP-8 for data
collection in a lab where inertial navigation equipment was developed,
used in Apollo, Minuteman, and such systems. I had studied lots of
mathematical techniques for engineering design, e.g., use of Karnaugh
Maps to minimize logic circuit components.
My desk happened to be next to one of the career engineer's desk (an
actual "rocket scientist"). So I asked him what kinds of tools he had
found were most useful for his work. His answer -- none of them. By
analyzing enormous amounts of data, they had discovered that almost all
failures were caused by some kind of metal-metal connector problem. So
their engineering principle was to minimize the number of such
connections in a design. There were no tools for that.
Another example occurred at BBN, when the ARPANET was being transformed
into the Defense Data Network, to become a DoD-wide operational
infrastructure. Someone (can't remember who) had produced a scientific
paper proving that the ARPANET algorithms would "lock up" and the entire
network would crash. That understandably caused significant concern in
the DoD. The DDN couldn't be allowed to crash.
After BBN investigated, we discovered that the research was true. But
there were assumptions made in order for the analysis to be tractable.
In particular, the analysis assumed that every IMP in the network ran at
exactly the same speed, and was started at exactly the same time, so
that all the programs were running in perfect synchrony, with
instructions being executed simultaneously in every IMP. That
assumption made the analysis mathematically feasible.
Without that assumption, the analysis was still accurate, but became
irrelevant. We advised the DoD not to worry, explaining that the
probability of such an occurrence was infinitesimal. If we had to make
that behavior happen, we didn't know how to do so. They agreed. DDN
continued to be deployed.
So my personal conclusion has been that scientific analysis is important
and useful, but has to be viewed in the context of real-world
conditions. The Internet in particular is a real-world environment that
seems, to me at least, to be mathematically intractable. There are many
components in use, even within a single TCP connection, where some of
the mechanisms (retransmissions, error detection, queue management,
timing, etc.) are in the switches, some are in the hosts'
implementations of TCP, and some are in the particular operating systems
involved.
There is a quote, attributed to Yogi Berra, which captures the situation:
"In theory, there is no difference between theory and practice. In
practice, there is."
While I was involved in designing internals of The Internet, generally
between 1972 and 1997, I don't recall much if any "analysis" of the
Internet as a whole communications system, including TCP, IP, UDP, as
well as mechanisms in each of the underlying network technologies.
Mostly design decisions were driven by intuition and/or experience.
Perhaps there was some comprehensive analysis, but I missed it.
Perhaps The Internet as a whole is just too complex for the existing
capabilities of mathematical tools?
Jack
On 3/17/25 21:46, Leonard Kleinrock wrote:
> Hi Jack,
>
> There were some queueing theory papers in those early days that did
> indeed shed some light on the phenomena and performance of the Arpanet
> and of Satnet. Here are a couple of references where analysis and
> measurement were both of value in providing understanding:
>
> https://www.lk.cs.ucla.edu/data/files/Naylor/On%20Measured%20Behavior%20of%20the%20ARPA%20Network.pdf
>
> and
>
> https://www.lk.cs.ucla.edu/data/files/Kleinrock/packet_satellite_multiple_access.pdf
>
> and this last paper even showed the “capture" effect with the SIMPs.
> In particular, one phenomenon was that if site A at one end of the
> Satnet was sending traffic to site B at the other end, then the fact
> that a message traveling from A to B forced a RFNM reply from B to A
> and this prevented B from sending its own messages to A since the
> RFNMs hogged the B to A channel. Lots more was observed and these are
> just some of the performance papers that used measurement and queueing
> models in those early days.
>
> Len
>
>
>
>> On Mar 11, 2025, at 1:42 PM, Jack Haverty via Internet-history
>> <internet-history at elists.isoc.org> wrote:
>>
>> On 3/11/25 07:05, David Finnigan via Internet-history wrote:
>>> It looks like staff at RSRE (Royal Signals and Radar Establishment) took
>>> the lead in experimenting with formulae and methods for dynamic
>>> estimation of round trip times in TCP. Does anyone here have any further
>>> insight or recollection into these experiments for estimating RTT, and
>>> the development of the RTT formula?
>>>
>>
>> IMHO the key factor was the state of the Internet at that time
>> (1980ish). The ARPANET was the primary "backbone" of The Internet in
>> what I think of as the "fuzzy peach" stage of Internet evolution.
>> The ARPANET was the peach, and sites on the ARPANET were adding LANs
>> of some type and connecting them with some kind of gateway to the
>> ARPANET IMP.
>>
>> The exception to that structure was Europe, especially Peter
>> Kirstein's group at UCL and John Laws group at RSRE. They were
>> interconnected somehow in the UK, but their access to the Internet
>> was through a connection to a SATNET node (aka SIMP) at Goonhilly Downs.
>>
>> SATNET was connected to the ARPANET through one of the "core
>> gateways" that we at BBN were responsible to run as a 24x7
>> operational network.
>>
>> The ARPANET was a packet network, but it presented a virtual circuit
>> service to its users. Everything that went in one end came out the
>> other end, in order, with nothing missing, and nothing duplicated.
>> TCPs at a US site talking to TCPs at another US site didn't have much
>> work to do, since everything they sent would be received intact. So
>> RTT values could be set very high - I recall one common choice was 3
>> seconds.
>>
>> For the UK users however, things were quite different. The "core
>> gateways" at the time were very limited by their hardware
>> configurations. They didn't have much buffering space. So they did
>> drop datagrams, which of course had to be retransmitted by the host
>> at the end of the TCP connection. IIRC, at one point the
>> ARPANET/SATNET gateway had exactly one datagram of buffer space.
>>
>> I don't recall anyone ever saying it, but I suspect that situation
>> caused the UCL and RSRE crews to pay a lot of attention to TCP
>> behavior, and try to figure out how best to deal with their skinny
>> pipe across the Atlantic.
>>
>> At one point, someone (from UCL or RSRE, can't remember) reported an
>> unexpected measurement. They did frequent file transfers, often
>> trying to "time" their transfers to happen at a time of day when UK
>> and US traffic flows would be lowest. But they observed that their
>> transfers during "busy times" went much faster than similar transfers
>> during "quiet times". That made little sense of course.
>>
>> After digging around with XNET, SNMP, etc., we discovered the cause.
>> That ARPANET/SATNET gateway had very few buffers. The LANs at users'
>> sites and the ARPANET path could deliver datagrams to that gateway
>> faster than SATNET could take them. So the buffers filled up and
>> datagrams were discarded -- just as expected.
>>
>> During "quiet times", the TCP connection would deliver datagrams to
>> the gateway in bursts (whatever the TCPs negotiated as a Window
>> size). Buffers in the gateway would overflow and some of those
>> datagrams were lost. The sending TCP would retransmit, but only after
>> the RTT timer expired, which was often set to 3 seconds. Result -
>> slow FTPs.
>>
>> Conversely, during "busy times", the traffic through the ARPANET
>> would be spread out in time. With other users' traffic flows
>> present, chances were better that someone else's datagram would be
>> dropped instead. Result - faster FTP transfers.
>>
>> AFAIK, none of this behavior was ever analyzed mathematically. The
>> mathematical model of an Internet seemed beyond the capability of
>> queuing theory et al. Progress was very much driven by
>> experimentation and "let's try this" activity.
>>
>> The solution, or actually workaround, was to improve the gateway's
>> hardware. More memory meant more buffering was available. That
>> principle seems to have continued even today, but has caused other
>> problems. Google "buffer bloat" if you're curious.
>>
>> As far as I remember, there weren't any such problems reported with
>> the various Packet Radio networks. They tended to be used only
>> occasionally, for tests and demos, where the SATNET linkage was used
>> almost daily.
>>
>> The Laws and Kirstein groups in the UK were, IMHO, the first "real"
>> users of TCP on The Internet, exploring paths not protected by
>> ARPANET mechanisms.
>>
>> Jack Haverty
>>
>> --
>> Internet-history mailing list
>> Internet-history at elists.isoc.org
>> https://elists.isoc.org/mailman/listinfo/internet-history
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20250318/3a215703/attachment-0001.asc>
More information about the Internet-history
mailing list