[ih] TCP RTT Estimator

Fri Apr 18 13:38:07 PDT 2025

That was while I was just an undergraduate so before my encountering 
networking.   But I think there haven't been enough people thinking 
about operations when designing systems.  Of course the whole arena of 
"distributed multi-processing" (aka "networking") was just emerging as a 
possibility to replace the more common punch-card-decks and printouts 
use of computers at the time.

Taking advantage of the geographical proximity of the ARPANET work, we 
Internet Guys tried hard to adapt the operations tools proven by years 
of ARPANET operations into the Internet environment.  IMPs sent "traps" 
to the NOC; SNMP provided similar functionality in the Internet.  IMPs 
had "fake hosts" that included functions like a remote debugger (DDT) 
that could be used to examine and modify remote IMPs while they were 
running.  XNET provided a similar capability.  Fake hosts that sent test 
traffic, or reflected incoming data back to the sender, et al were 
replicated in the gateways.

When DoD first declared TCP and IP as DoD Standards, we didn't notice 
until some non-research implementations began appearing that the 
Standards didn't include "supporting protocols" -- such as ICMP, which 
was crucial for network operations, or SNMP.  Or "fake host" functions.  
Since it wasn't part of the Specification, big government contractors 
didn't implement it.  It took some effort to get that changed.

When I was later involved in running a large corporate internet, we used 
a lot of such existing "instrumentation" as did exist, e.g., SNMP to 
monitor our own network and find and fix problems.  TCP is good at 
hiding errors from its endusers, but equally good at hiding problems 
that should be fixed by whoever, if anyone, is managing the system.   
Personally I've found 50+ devices on my home LAN; I don't manage any of 
them.

It's easy for someone not familiar with operating a network service to 
see some mechanism as superfluous and either not implement it or perhaps 
remove it from the next iteration of design.  I've often wondered how 
"internet operations" has evolved over the years, and how problems that 
are hidden by TCP are noticed and fixed -- if they are.

Jack

On 4/18/25 12:55, Steve Crocker wrote:
> Jack,
>
> In our early design discussions, late 1968 and early 1969, Jeff 
> Rulifson from SRI pushed for a checksum for exactly the reason you 
> described, viz it would catch software errors between layers.
>
> Steve
>
> Sent by a Verified
>
> sender
>
>
> On Fri, Apr 18, 2025 at 3:49 PM Jack Haverty <jack at 3kitty.org> wrote:
>
>     Hi Steve,
>
>     The IMPs actually had quite a bit of internal mechanism to address
>     relability, which evolved as the network grew and more problems
>     occurred.  I joined Frank Heart's group in 1977, to work on
>     TCP-related contracts, but was surrounded by the ARPANET group so
>     a lot of that DNA transferred over.  The IMP subnet tried real
>     hard to hide the vagaries of packet switching from the end users'
>     hosts.  In effect there was an end-end TCP-like function between
>     two hosts' attached IMPs that made sure no packets were ever lost,
>     duplicated, or reordered so that the hosts saw only the behavior
>     of a virtual circuit.
>
>     When Vint asked me to take over the gateway work and make the
>     "core" Internet a 24x7 service, we congealed a "Gateway Group" to
>     do the work.  They lived literally down the hall from the ARPANET
>     world.  One of that crew was Mike Brescia.  We were working on
>     making the Internet work like the ARPANET, but until the NOC was
>     in the loop, Mike watched over the Internet behavior to catch and
>     fix problems.
>
>     One day some gateway was reporting lots of checksum errors.  Some
>     tools were already available for remote debugging.  So Mike
>     investigated and noticed that all of the errors were for datagrams
>     from one particular host somewhere in the US Midwest.  So not a
>     gateway or IMP problem, but Mike captured a few failed datagrams,
>     and noticed that bytes in the TCP header were out-of-order, which
>     was causing the checksum errors.
>
>     That was a problem I had already struggled with in implementing my
>     Unix TCP.  We knew about it.  So Mike looked up the host info in
>     the NIC, and sent an email to the technical contact - something
>     like "You need to swap the address bytes in your source address." 
>     Shortly after, he got a reply, something like "Thanks!  That was
>     the problem."
>
>     Shortly after that he got another email -- "How the *^&%&^% did
>     you know that?!"
>
>     Checksumming was a useful debugging tool, even for remote
>     debugging.   I've learned over the years that researchers don't
>     think enough about how their designs will be operated and managed.
>
>
>     Jack
>
>
>     On 4/18/25 12:12, Steve Crocker wrote:
>>     We tried to include a lightweight checksum in the original
>>     host-host protocol.  (Later it was called the Network Control
>>     Protocol or NCP.  Same protocol.)  The checksum was designed to
>>     be reasonably easy to compute.  It was a 16-bit ones complement
>>     sum with one bit of rotation every thousand or so bits.  (The
>>     rotation was intended to catch packets out of order, error which
>>     we imagined might be possible but never occurred.)  Frank Heart
>>     argued vehemently against it, saying it would make his network
>>     look slow.  I tried to push back and asked about the Host-IMP
>>     interface.  "As reliable as your accumulator," he roared.
>>
>>     We removed the checkum from our design, a mistake I've rued ever
>>     since.  And, of course, it turned out there were indeed a few
>>     cases where it would have made a difference.  As has been pointed
>>     out, there was a major memory error in one of the IMPs that
>>     caused that IMP to look like it was zero distance to every IMP. 
>>     But even before that error, when Lincoln Lab first connected its
>>     host to its IMP, their hardware interface had a problem.  There
>>     was some crosstalk between the interface and the disk (or drum)
>>     controller.  When the disk (or drum) was operating at the same
>>     time as the Host-IMP interface, some bits got scrambled.  It
>>     apparently took them some time to track down.  I think they would
>>     have found it faster if the checksum had been part of the design.
>>
>>     Steve
>>
>>
>>     On Fri, Apr 18, 2025 at 1:25 PM Andrew G. Malis via
>>     Internet-history <internet-history at elists.isoc.org> wrote:
>>
>>         Jack,
>>
>>         > Thinking back, I can't recall the reason for including
>>         checksums in TCP
>>         at all.
>>
>>         It was primarily to catch memory errors, which were a real
>>         thing back in
>>         the core memory days. Errors during transmission were
>>         generally caught by
>>         the lower layers.
>>
>>         Cheers,
>>         Andy
>>         -- 
>>         Internet-history mailing list
>>         Internet-history at elists.isoc.org
>>         https://elists.isoc.org/mailman/listinfo/internet-history
>>
>>
>>
>>     -- 
>>     Sent by a Verified
>>
>>     sender
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20250418/fdde366b/attachment.asc>