[ih] Checksums in Host-Host protocol

Sat Apr 19 14:00:16 PDT 2025

 Steve,

It sounds so simple, but the devil is in the details.  Given the plethora
of word sizes of the computers connected (or scheduled to be connected) to
ARPAnet in 1971, I would bet that for ANY checksum length proposed over 50%
of the Hosts would have to engage in mask and shift operations on every
word in a message in order to calculate even a checksum like the one you
describe. This would indeed have somewhat slowed the effective network
speed.  Enough to matter? Who knows, but at that time maximum effective
bandwidth was of real concern to prospective users (remember the motivation
for the design of the Tinker-McClellan experiment).  Recall that at that
time a majority of the communications community viewed packet switching as
foolish, and ARPAnet as an experiment about to fail. Yes, a simple
end-to-end checksum would have sometimes been of diagnostic help, but both
Frank Heart and Larry Roberts had a real reason to worry about anything
that would negatively affect perceived performance of this brand new
technology. Maybe we should have done checksumming for debugging in TELNET,
where performance was irrelevant, and left File Transfer alone.

Cheers,
Alex

On Friday, April 18, 2025 at 03:12:39 PM EDT, Steve Crocker via
Internet-history <internet-history at elists.isoc.org> wrote:

We tried to include a lightweight checksum in the original host-host
protocol.  (Later it was called the Network Control Protocol or NCP.  Same
protocol.)  The checksum was designed to be reasonably easy to compute.  It
was a 16-bit ones complement sum with one bit of rotation every thousand or
so bits.  (The rotation was intended to catch packets out of order, error
which we imagined might be possible but never occurred.)  Frank Heart
argued vehemently against it, saying it would make his network look slow.
I tried to push back and asked about the Host-IMP interface.  "As reliable
as your accumulator," he roared.

We removed the checkum from our design, a mistake I've rued ever since.
And, of course, it turned out there were indeed a few cases where it would
have made a difference.  As has been pointed out, there was a major memory
error in one of the IMPs that caused that IMP to look like it was zero
distance to every IMP.  But even before that error, when Lincoln Lab first
connected its host to its IMP, their hardware interface had a problem.
There was some crosstalk between the interface and the disk (or drum)
controller.  When the disk (or drum) was operating at the same time as the
Host-IMP interface, some bits got scrambled.  It apparently took them some
time to track down.  I think they would have found it faster if the
checksum had been part of the design.

Steve