[ih] TCP RTT Estimator

Fri Apr 18 12:55:43 PDT 2025

Jack,

In our early design discussions, late 1968 and early 1969, Jeff Rulifson
from SRI pushed for a checksum for exactly the reason you described, viz it
would catch software errors between layers.

Steve

Sent by a Verified

sender

On Fri, Apr 18, 2025 at 3:49 PM Jack Haverty <jack at 3kitty.org> wrote:

> Hi Steve,
>
> The IMPs actually had quite a bit of internal mechanism to address
> relability, which evolved as the network grew and more problems occurred.
> I joined Frank Heart's group in 1977, to work on TCP-related contracts, but
> was surrounded by the ARPANET group so a lot of that DNA transferred over.
> The IMP subnet tried real hard to hide the vagaries of packet switching
> from the end users' hosts.  In effect there was an end-end TCP-like
> function between two hosts' attached IMPs that made sure no packets were
> ever lost, duplicated, or reordered so that the hosts saw only the behavior
> of a virtual circuit.
>
> When Vint asked me to take over the gateway work and make the "core"
> Internet a 24x7 service, we congealed a "Gateway Group" to do the work.
> They lived literally down the hall from the ARPANET world.  One of that
> crew was Mike Brescia.  We were working on making the Internet work like
> the ARPANET, but until the NOC was in the loop, Mike watched over the
> Internet behavior to catch and fix problems.
>
> One day some gateway was reporting lots of checksum errors.  Some tools
> were already available for remote debugging.  So Mike investigated and
> noticed that all of the errors were for datagrams from one particular host
> somewhere in the US Midwest.  So not a gateway or IMP problem, but Mike
> captured a few failed datagrams, and noticed that bytes in the TCP header
> were out-of-order, which was causing the checksum errors.
>
> That was a problem I had already struggled with in implementing my Unix
> TCP.  We knew about it.  So Mike looked up the host info in the NIC, and
> sent an email to the technical contact - something like "You need to swap
> the address bytes in your source address."  Shortly after, he got a reply,
> something like "Thanks!  That was the problem."
>
> Shortly after that he got another email -- "How the *^&%&^% did you know
> that?!"
>
> Checksumming was a useful debugging tool, even for remote debugging.
> I've learned over the years that researchers don't think enough about how
> their designs will be operated and managed.
>
>
> Jack
>
>
> On 4/18/25 12:12, Steve Crocker wrote:
>
> We tried to include a lightweight checksum in the original host-host
> protocol.  (Later it was called the Network Control Protocol or NCP.  Same
> protocol.)  The checksum was designed to be reasonably easy to compute.  It
> was a 16-bit ones complement sum with one bit of rotation every thousand or
> so bits.  (The rotation was intended to catch packets out of order, error
> which we imagined might be possible but never occurred.)  Frank Heart
> argued vehemently against it, saying it would make his network look slow.
> I tried to push back and asked about the Host-IMP interface.  "As reliable
> as your accumulator," he roared.
>
> We removed the checkum from our design, a mistake I've rued ever since.
> And, of course, it turned out there were indeed a few cases where it would
> have made a difference.  As has been pointed out, there was a major memory
> error in one of the IMPs that caused that IMP to look like it was zero
> distance to every IMP.  But even before that error, when Lincoln Lab first
> connected its host to its IMP, their hardware interface had a problem.
> There was some crosstalk between the interface and the disk (or drum)
> controller.  When the disk (or drum) was operating at the same time as the
> Host-IMP interface, some bits got scrambled.  It apparently took them some
> time to track down.  I think they would have found it faster if the
> checksum had been part of the design.
>
> Steve
>
>
> On Fri, Apr 18, 2025 at 1:25 PM Andrew G. Malis via Internet-history <
> internet-history at elists.isoc.org> wrote:
>
>> Jack,
>>
>> > Thinking back, I can't recall the reason for including checksums in TCP
>> at all.
>>
>> It was primarily to catch memory errors, which were a real thing back in
>> the core memory days. Errors during transmission were generally caught by
>> the lower layers.
>>
>> Cheers,
>> Andy
>> --
>> Internet-history mailing list
>> Internet-history at elists.isoc.org
>> https://elists.isoc.org/mailman/listinfo/internet-history
>>
>
>
> --
> Sent by a Verified
>
> sender
>
>
>