[ih] Checksums in Host-Host protocol

Sun Apr 20 16:21:27 PDT 2025

Jack,

Yes, the big computer(s) on a university campus operated on a cost recovery
basis and OMB Circular A-21 (I think) imposed some rules on charging
computer time to government funded projects.  But I'm quite sure that was
not relevant in this instance.  Most of the hosts on the Arpanet were part
of the research projects that ARPA was supporting.  In a few cases, e.g.
UCLA's main computer, a 360/91, the main computer was connected to the
Arpanet.  Anyone who used that computer needed an account and had to pay
for its use.  (I accompanied Larry Roberts on a visit to UCLA when he
negotiated for a large amount of computer time to run climate modelling
research using weekend and evening times for a reduced rate.)

Frank's response struck me as a mixture of engineering instinct and pride.
The IMPs used a very strong error detection code to detect packets that
had been damaged in transit.  There was no parity checking in the memory
but the Honeywell 516 was ruggedized -- though I'm not sure that meant with
respect to reliability.  I tried to push back by asking about the 100 kb/s
line between the host and the IMP.  "As reliable as your accumulator,"
roared Frank.  For those on this list who aren't familiar with computer
architecture before the entire CPU fit on one chip, the accumulator was the
primary register in the CPU.  If it failed, the computer was completely out
of commission.

To his credit, Frank took personal pride in the operation of the Arpanet.
In a certain sense he viewed it as "his" network.  It wasn't, of course,
but his dedication to it was admirable and undoubtedly contributed to its
success.

Meanwhile, there was another performance detail also involved in the
interaction between the host and the IMP.  The arriving packet ended with
padding that consisted of a 1 and some number of 0's appended to the
message.  The receiving host had to trim off these bits.  Finding the
location of the last 1 is time consuming if you test each bit from the end
and work backwards.  Some machines have instructions that make it easy to
find that bit, but most do not.

I wrote RFC 70, A Note on Padding, to show how to find that bit quickly
using some logical operations and division by a suitably chosen small
number.  I don't know if anyone used the idea in their code.

Regarding programming techniques changing over time, yes indeed!  The IMPs
had relatively little memory compared to today's machines.  Lick gave BBN
trouble when the TIP responded "Trying.." when someone initiated a
connection.  He wanted three dots.  BBN said it would take another half
word.

Going back much further, I spent some time learning to program the SWAC,
Standards Western Automatic Computer, the sister machine to the better
known SEAC.  256 words of memory, so addresses were eight bits.
Instructions were 36 bits wide, a four bit opcode and four addresses.  A
typical instruction was "Add alpha to beta, store in gamma and transfer to
delta if overflow."

There was no divide instruction, no index registers, no floating point and
no subroutine calls.  Division was done by successive approximation.  Loops
used self-modifying code(!)  Amazing work was done on that machine.

Fun fact: The usual computation of square root uses the iteration x' = (x +
A/x)/2.  Without a divide instruction, this iteration is very expensive.
However, the iteration for the *reciprocal* of square root, x' =
(3x-Ax^3)/2, uses three multiplications but no division.

Getting back to the cost of computing the checksum, I thought further about
the approach I described.  I think I could do better and save perhaps 20%
or so.  It would have been interesting to have a contest to see who could
write the fastest code.

On Sun, Apr 20, 2025 at 1:45 PM Jack Haverty via Internet-history <
internet-history at elists.isoc.org> wrote:

> Hi Steve,
>
> IIRC there were concerns other than the Engineering ones.
>
> Computers in 1970 were big, expensive to purchase, and expensive to
> operate.  Projects using computers were charged by the "CPU-second" for
> the users' computing, but time consumed by the OS was overhead, and its
> costs spread across all users, along with the costs for power, HVAC,
> floor space, and such.

> ARPA was into "resource sharing" but organizations with computers
> (mostly universities and research sites IIRC) were sometimes reluctant
> to "share" their hard-won computing power.   ARPA was paying the bills
> though, so it could pressure them to connect to the ARPANET.
>
> Still, there was concern to minimize that overhead in the OS, e.g., RFC
> 425, titled "But my NCP costs $500 a day...", written by Bob Bressler,
> in Frank Heart's ARPANET group at BBN. (In 1977 I migrated from MIT to
> Frank's group at BBN, to implement TCP for Unix)
>
> It's easy to forget the computing world of 50+ years ago, now that we
> have computers (aka "devices")  lying around everywhere, much more
> powerful than any machine on the ARPANET in the 1970s, but sitting idle
> most of the time.
>
> Even if the Engineering analysis showed a bit of additional OS overhead
> was minimal, it was still a thorn in the sides of IT managers fifty
> years ago.
>
> But yes, I agree that the PDP-10 had some great instructions.  I spent a
> lot of time in the 70s writing assembler code in Lick's group at MIT, as
> we all tried feverishly to make our PDP-10 do more than it was capable
> of doing.   We did things like using JFCL as the best NO-OP instruction,
> simply because it was the fastest such instruction to execute.  The
> "official" NOP instruction took a bit longer.  That and other such
> techniques were considered common practice of us "bit twiddlers", all to
> save a few CPU-microseconds.
>
> BTW, the ARPANET IMP code is the most amazing example of such "bit
> twiddling" that I have ever encountered.  I did a "deep dive" into the
> old IMP code as a consultant in a patent battle, to figure out exactly
> how the IMP did some of the things that I remembered it did.  I was
> astonished at the techniques used to get every bit of power out of the
> Honeywell minicomputer that was the core of the IMP, often by violating
> today's rules of good program design.  It's just what you did in those
> days of too little CPU, too little memory, and everything too expensive.
>
> Frank Heart's focus was on Engineering - getting the system to work.  I
> can still hear his voice as he argued for that.  You knew he was serious
> when the pitch of his voice approached ultrasonic.
>
> Jack
>
>
> On 4/19/25 20:14, Steve Crocker via Internet-history wrote:
> > Alex,
> >
> > Thanks for your note.  I understand the point you're making, but I don't
> > think the effect would have been very large.  I did a mental exercise to
> > estimate the time to compute the checksum on PDP-10s.  The PDP-10 was a
> 36
> > bit machine, so it definitely would have required the extraction and
> > shifting you're referring to.  However, it also had a very powerful
> > instruction set.  One of its instructions was a Load Byte instruction.
> See
> > http://pdp10.nocrew.org/docs/instruction-set/Byte.html  .  It uses a
> byte
> > pointer which contains the size and offset of the byte, so the
> instruction
> > takes 3 cycles.  Once the byte is loaded into one of the several
> registers,
> > it can be added to another register in one instruction, and because it
> does
> > not need to load data from memory, I believe that Add instruction would
> > take only one cycle.  Hence, a total of 4 cycles to extract a byte and
> add
> > it to a register.
> >
> > The machine has enough registers to permit assigning one for each of the
> > four offsets, thus deferring the shifting to the end.
> >
> > A typical word will have three bytes. so it will take 12 cycles per 36
> bit
> > word.  But we can do slightly better.  Four words is 144 bits, which is 9
> > 16-bit bytes.  In two of those four words, there will be 32 bits that
> have
> > two complete 16-bit bytes.  These can be treated as a single byte in the
> > inner loop and subdivided at the end.  Therefore, within each group of
> four
> > words, there will be only 10 Load Byte and Add instructions, i.e. 40
> cycles
> > for every four words, or 40/9 cycles per 16 bit byte.
> >
> > An 8000 bit message has 500 16-bit bytes, so the inner loop to add all
> the
> > 16 bit-bytes is roughly 500 * 40 / 9 cycles, approximately 2222 cycles.
> > Round this up to 2500 cycles to accommodate the loop management and the
> > assembling the pieces at the end.
> >
> > According to chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/
> >
> https://archive.computerhistory.org/resources/text/DEC/pdp-10/dec.pdp-10.the_evolution_of_the_DECsystem_10.1978.102630382.pdf
> > , the cycle time on the KA-10 was 5 microseconds, so the computation time
> > for the checksum over a full 8000 bit message would have been about 12.5
> > ms.  Double this to account for creating the checksum on the sending side
> > and checking it on the receiving side, so 25 ms overall.
> >
> > For short messages, e.g. typical interactive messages, this figure would
> be
> > *much* smaller.
> >
> > The Arpanet spec was to deliver a message from end to end in under a half
> > second, i.e. 500 ms.  Thus the checksum would have added a little bit of
> > time, but only about 5%.
> >
> > Apologies if there are errors in the above.  Corrections welcome.
> >
> > Steve
> >
> > On Sat, Apr 19, 2025 at 5:00 PM Alexander McKenzie via Internet-history <
> > internet-history at elists.isoc.org> wrote:
> >
> >>   Steve,
> >>
> >> It sounds so simple, but the devil is in the details.  Given the
> plethora
> >> of word sizes of the computers connected (or scheduled to be connected)
> to
> >> ARPAnet in 1971, I would bet that for ANY checksum length proposed over
> 50%
> >> of the Hosts would have to engage in mask and shift operations on every
> >> word in a message in order to calculate even a checksum like the one you
> >> describe. This would indeed have somewhat slowed the effective network
> >> speed.  Enough to matter? Who knows, but at that time maximum effective
> >> bandwidth was of real concern to prospective users (remember the
> motivation
> >> for the design of the Tinker-McClellan experiment).  Recall that at that
> >> time a majority of the communications community viewed packet switching
> as
> >> foolish, and ARPAnet as an experiment about to fail. Yes, a simple
> >> end-to-end checksum would have sometimes been of diagnostic help, but
> both
> >> Frank Heart and Larry Roberts had a real reason to worry about anything
> >> that would negatively affect perceived performance of this brand new
> >> technology. Maybe we should have done checksumming for debugging in
> TELNET,
> >> where performance was irrelevant, and left File Transfer alone.
> >>
> >> Cheers,
> >> Alex
> >>
> >> On Friday, April 18, 2025 at 03:12:39 PM EDT, Steve Crocker via
> >> Internet-history<internet-history at elists.isoc.org>  wrote:
> >>
> >>
> >> We tried to include a lightweight checksum in the original host-host
> >> protocol.  (Later it was called the Network Control Protocol or NCP.
> Same
> >> protocol.)  The checksum was designed to be reasonably easy to
> compute.  It
> >> was a 16-bit ones complement sum with one bit of rotation every
> thousand or
> >> so bits.  (The rotation was intended to catch packets out of order,
> error
> >> which we imagined might be possible but never occurred.)  Frank Heart
> >> argued vehemently against it, saying it would make his network look
> slow.
> >> I tried to push back and asked about the Host-IMP interface.  "As
> reliable
> >> as your accumulator," he roared.
> >>
> >> We removed the checkum from our design, a mistake I've rued ever since.
> >> And, of course, it turned out there were indeed a few cases where it
> would
> >> have made a difference.  As has been pointed out, there was a major
> memory
> >> error in one of the IMPs that caused that IMP to look like it was zero
> >> distance to every IMP.  But even before that error, when Lincoln Lab
> first
> >> connected its host to its IMP, their hardware interface had a problem.
> >> There was some crosstalk between the interface and the disk (or drum)
> >> controller.  When the disk (or drum) was operating at the same time as
> the
> >> Host-IMP interface, some bits got scrambled.  It apparently took them
> some
> >> time to track down.  I think they would have found it faster if the
> >> checksum had been part of the design.
> >>
> >> Steve
> >> --
> >> Internet-history mailing list
> >> Internet-history at elists.isoc.org
> >> https://elists.isoc.org/mailman/listinfo/internet-history
> >>
> >
>
> --
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history
>

-- 
Sent by a Verified

sender