[ih] early networking: "the solution"

Mon Apr 22 00:01:11 PDT 2024

On Mon, Apr 22, 2024 at 2:07 AM Jack Haverty via Internet-history <
internet-history at elists.isoc.org> wrote:

> Steve,
>
> You were right, and checksum issues did cause troubles in the Arpanet.
> The IMPs *did* have errors.  The details have become fuzzy, but IIRC
> there was a routing failure at one point that took down the entire
> Arpanet.   The cause was traced to a bad memory in some IMP that was
> corrupting packets if they happened to use that memory.   Some of the
> packets were internal packets disseminating routing information ... and
> the bad data resulted in the net locking up in perpetual routing
> confusion.   Checksums caught errors on circuits, but not errors inside
> the IMP memory.
>
there were at least two cases. In one, all the bits of the distance vector
were zeroed making it look like the IMP at Harvard (?) was zero hops from
all other IMPs. In a second case, a flaky memory bit caused a routing
packet to be sent repeatedly everywhere in the network as each successive
packet looked like an "update" and was sent on all circuits between the
IMPs. At least, that's my hazy recollection. The memory of the IMPs was not
error checked in the way current day computers do.

>
> Checksums were useful as debugging tools in Internet operation just like
> in the Arpanet.  When we got the task of making the core Internet into a
> 24x7 we took the easiest route and applied the same techniques that had
> been developed for the Arpanet NOC.   One of those techniques was
> "traps", which were essentially error reports from remote switches to
> NOC operators.   So the core gateways quickly acquired the ability to
> report errors back to our NOC, just like IMPs had been doing for about a
> decade.
>
> Mike Brescia was one of the "Internet gang" and he with Bob Hinden and
> Alan Sheltzer watched over the neonatal Internet core to keep it as
> close as possible to a 24x7 service.  One day Mike noticed that a
> particular router was reporting lots of checksum errors.  He
> investigated and saw that a new host was trying to come up on the
> Internet and apparently someone was debugging their TCP.  The checksum
> reports revealed the problem -- IIRC, the 4 bytes of IP addresses were
> misordered.   That was easy to do with 16-bit CPUs holding 2 8-bit bytes
> in each word.
>
> So Mike looked up the host information at the NIC, found the email
> address of the likely responsible human, and sent an email, something
> like "FYI, you need to swap the bytes in your IP addresses".   A short
> while later he got back an answer, something like "Hey, thanks!  That
> was it."   Not long after that he got another email -- "How the &^&^%
> did you know that???"   The TCP developer somewhere in the midwest IIRC
> had realized that someone in Boston had been looking over their shoulder
> from a thousand miles away.
>
> Remote debugging.  Checksums were potent debugging tools.  Started in
> the Arpanet (I think), and we just moved it into the Internet.
>
> Fun times.
> Jack Haverty
>
> On 4/21/24 15:56, Steve Crocker wrote:
> > There was a bit of checksum history earlier.  In our early thinking
> > about protocols for the Arpanet, Jeff Rulifson pointed out the utility
> > of checksums to detect possible errors in implementations.  By "early
> > thinking" I mean the period between August 1968 and February 1969.
> > This was well before BBN issues report 1822 with the details of the
> > message format, etc.  We knew the IMPs would accept messages of
> > roughly 8000 bits, break them into packets of roughly 1000 bits, and
> > then reassemble them at the receiving IMP.  We also knew the IMPs
> > would be using very strong checksums to detect transmission errors
> > between the IMPs.
> >
> > We decided to use a 16 bit checksum that was simply the ones
> > complement sum of the message, with one added wrinkle of rotating the
> > sum by one bit every thousand bits.  We included this wrinkle to catch
> > the possible error of misordering the packets during reassembly.
> >
> > On 14 Feb 1969, a few of us in the Network Working Group met with the
> > IMP team at BBN for the first time.  When we described our thinking
> > about a checksum, Frank Heart hit the roof.  "You'll make my network
> > look slow!" his voice reaching his trademarked high pitch when he was
> > exercised.  He pointed out they had very strong checksums for the
> > transmissions between the IMPs.
> >
> > I tried to counter him.  "What about the path between the host and the
> > IMP?" I asked.  "As reliable as your accumulator," he roared.  (For
> > those young enough not to understand what this referred to, in those
> > days the central processing unit of a computer consisted of separate
> > components.  The accumulator was a separate piece of hardware that
> > held one word of data. It was involved in almost every instruction, so
> > if it broke, the computer was broken.)
> >
> > To my everlasting embarrassment, I yielded.  We didn't challenge
> > whether the IMPs might ever have an error, and we didn't insist that
> > it wouldn't really cost very much to have a lightweight checksum.  We
> > dropped the idea of including a checksum.
> >
> > Unlike the Arpanet, the Internet included a wide variety of computing
> > and transmission environments, so the need for checksums was far more
> > evident.  But we didn't have to wait that long.  When Lincoln Lab
> > connected the TX-2 to the IMP 10 sometime in 1970 or early 1971, they
> > had intermittent errors that took a while to track down.  It turned
> > out when their drum was operating, there was hardware interference
> > with their IMP interface.  A simple checksum might have helped narrow
> > down where to look ;)
> >
> > Steve
> >
> >
> > On Sun, Apr 21, 2024 at 6:14 PM Jack Haverty via Internet-history
> > <internet-history at elists.isoc.org> wrote:
> >
> >     Probably not many people know the story behind the IP checksum.   I
> >     don't think anyone's ever written it down.  While I still
> remember...:
> >
> >     The checksum algorithm was selected not for its capabilities to catch
> >     errors, but rather for its simplicity for our overworked and
> >     inadequate
> >     computing power.  There was significant concern at the time,
> >     especially
> >     in the sites running the big host computers, about the use of scarce
> >     computing power as "overhead" involved in using the network. See for
> >     example: https://www.rfc-editor.org/rfc//rfc425
> >
> >     Besides, at the time all TCP traffic was through the Arpanet, and the
> >     IMPs did their own checksums so any circuit problems would be caught
> >     there.  So as we were defining the details of the new TCP4
> >     mechanisms,
> >     the checksum algorithm was kept intentionally simple, to be
> >     replaced in
> >     some future version of TCP when computers would be more capable
> >     and the
> >     error characteristics of pathways through the Internet were better
> >     understood by experience.   The checksum algorithm was a
> >     placeholder for
> >     a future improved version, like many other mechanisms of TCP/IP4.
> >
> >     The actual details of the checksum computation were nailed down on
> >     January 27, 1979.  That was the date of the first TCP Bakeoff,
> >     organized
> >     by Jon Postel.   I think of it as possibly the first ever
> "Hackathon".
> >
> >     The group of TCP implementers assembled on a weekend at USC-ISI and
> >     commandeered a bunch of offices with terminals that we could use to
> >     connect to our computers back home.   At first, we could all talk to
> >     ourselves fine.   However, no one could talk to any other
> >     implementation.  Everybody was getting checksum errors.
> >
> >     Since we could all hear each other, a discussion quickly reach a
> >     consensus.   We turned off the checksum verification code in all
> >     of our
> >     implementations, so our TCPs would simply assume every incoming
> >     message/packet/datagram/segment (you pick your favorite term...) was
> >     error-free.
> >
> >     It seems strange now, but computing in the 1970s was a lot different
> >     from today.  In addition to the scarcity of CPU power and memory,
> >     there
> >     was little consensus about how bits were used inside of each
> >     computer,
> >     and how they were transferred onto wires by network interface
> >     hardware.
> >     Computers didn't agree on the number of bits in a byte, or how bytes
> >     were ordered into computer words, how arithmetic calculations were
> >     performed, or how to take the bits in and out of your computer's
> >     memory
> >     and transfer them serially over an I/O interface.  If you think the
> >     confusion of today's USB connectors is bad, it was much worse 50
> >     years ago!
> >
> >     Danny Cohen later published a great "plea for peace" that reveals
> >     some
> >     of the confusion - see https://www.rfc-editor.org/ien/ien137.txt
> >
> >     So it wasn't a surprise that each TCP implementer had somehow
> >     failed in
> >     translating the specification, simple as it was, into code.
> >
> >     The disabling of checksums enabled us to debug all this and slowly
> >     (took
> >     two days IIRC) got implementations to talk to other implementations.
> >     Then we re-enabled checksumming and tried all the tests again.  TCP4
> >     worked!  Jon Postel took on the task of figuring out how the now
> >     working
> >     checksums actually were doing the computations and revised the
> >     specifications accordingly.   Rough consensus and running code had
> >     failed; instead we had running code and then rough consensus.
> >
> >     My most memorable recollection of that weekend was late on Sunday.
> >     Jon
> >     had set up the Bakeoff with a "scoring scheme" which gave each
> >     participant a number of points for passing each test.   His score
> >     rules
> >     are here:
> >
> https://drive.google.com/file/d/1NNc9tJTEQsVq-knCCWLeJ3zVrL2Xd25g/view?usp=sharing
> >
> >     We were all getting tired, and Bill Plummer (Tenex TCP) shouted
> >     down the
> >     hall to Dave Clark (Multics TCP) -- "Hey Dave, can you turn off your
> >     checksumming again?"  Dave replied "OK, it's off".  Bill hit a key on
> >     his terminal.  Dave yelled "Hey, Multics just crashed!"  Bill gloated
> >     "KO! Ten points for me!"
> >
> >     Such was how checksumming made it into TCP/IP4.
> >
> >     Jack Haverty
> >
> >
> >
> >     On 4/21/24 12:27, John Day via Internet-history wrote:
> >     > So I wasn’t dreaming!  ;-)
> >     >
> >     > CRCs also have problems in HDLC if there are a lot of 1s in the
> >     data.  (The bit stuffing is not included in the checksum
> calculation.)
> >     >
> >     >> On Apr 21, 2024, at 15:22,touch at strayalpha.com wrote:
> >     >>
> >     >> I think it was this one:
> >     >> http://ccr.sigcomm.org/archive/1995/conf/partridge.pdf
> >     >>
> >     >> Joe
> >     >>
> >     >> —
> >     >> Dr. Joe Touch, temporal epistemologist
> >     >> www.strayalpha.com <http://www.strayalpha.com>
> >     >>
> >     >>> On Apr 21, 2024, at 12:20 PM, Scott Bradner via
> >     Internet-history<internet-history at elists.isoc.org> wrote:
> >     >>>
> >     >>> maybe in conjunction with the Pac Bell NAP
> >     >>>
> >     >>> https://www.cnet.com/tech/mobile/pac-bell-adds-network-access/
> >     >>>
> >     >>> https://mailman.nanog.org/pipermail/nanog/1998-March/127113.html
> >     >>>
> >     >>> Scott
> >     >>>
> >     >>>> On Apr 21, 2024, at 3:00 PM, John Day<jeanjour at comcast.net>
> >     wrote:
> >     >>>>
> >     >>>> I have a vague recollection of a paper (possibly by Craig
> >     Partridge) that talked about ATM dropping cells (and possibly
> >     other different forms of errors) and how IP and other protocols
> >     were not built to detect such losses.
> >     >>>>
> >     >>>> Am I dreaming?
> >     >>>>
> >     >>>> John
> >     >>>>
> >     >>>>> On Apr 21, 2024, at 09:10, Scott Bradner via
> >     Internet-history<internet-history at elists.isoc.org> wrote:
> >     >>>>>
> >     >>>>> yes but...
> >     >>>>>
> >     >>>>> the ATM Forum people felt that ATM should replace TCP and
> >     most of IP
> >     >>>>> i.e. become the new IP and that new applications should
> >     assume they were
> >     >>>>> running over ATM and directly make use of ATM features
> >     (e.g., ABR)
> >     >>>>>
> >     >>>>> ATM as yet another wire was just fine (though a bit choppy)
> >     >>>>>
> >     >>>>> Scott
> >     >>>>>
> >     >>>>>
> >     >>>>>
> >     >>>>>> On Apr 21, 2024, at 9:02 AM, Andrew G.
> >     Malis<agmalis at gmail.com> wrote:
> >     >>>>>>
> >     >>>>>> Scott,
> >     >>>>>>
> >     >>>>>> ATM could carry any protocol that you could carry over
> >     Ethernet, see RFCs 2225, 2492, and 2684.
> >     >>>>>>
> >     >>>>>> Cheers,
> >     >>>>>> Andy
> >     >>>>>>
> >     >>>>>>
> >     >>>>>> On Sat, Apr 20, 2024 at 8:15 PM Scott Bradner via
> >     Internet-history<internet-history at elists.isoc.org> wrote:
> >     >>>>>>
> >     >>>>>>
> >     >>>>>>> On Apr 20, 2024, at 8:11 PM, John Gilmore via
> >     Internet-history<internet-history at elists.isoc.org> wrote:
> >     >>>>>>>
> >     >>>>>>> John Day via
> >     Internet-history<internet-history at elists.isoc.org> wrote:
> >     >>>>>>>> In the early 70s, people were trying to figure out how to
> >     interwork multiple networks of different technologies. What was
> >     the solution that was arrived at that led to the current Internet?
> >     >>>>>>>> I conjectured yesterday that the fundamental solution
> >     must have been in hand by the time Cerf and Kahn published their
> >     paper.
> >     >>>>>>>> Are you conjecturing that the solution was gateways? and
> >     hence protocol translation at the gateways?
> >     >>>>>>> Maybe it's too obvious in retrospect.  But the "solution"
> >     that I see was
> >     >>>>>>> that everyone had to move to using a protocol that was
> >     independent of
> >     >>>>>>> their physical medium.
> >     >>>>>> and ATM was an example of the reverse - it was a protocol &
> >     a network - OK
> >     >>>>>> as long as you did not build applications that knew they
> >     were running over ATM
> >     >>>>>> (or if ATM had been the last networking protocol)
> >     >>>>>>
> >     >>>>>> Scott
> >     >>>>>> --
> >     >>>>>> Internet-history mailing list
> >     >>>>>> Internet-history at elists.isoc.org
> >     >>>>>> https://elists.isoc.org/mailman/listinfo/internet-history
> >     >>>>> --
> >     >>>>> Internet-history mailing list
> >     >>>>> Internet-history at elists.isoc.org
> >     >>>>> https://elists.isoc.org/mailman/listinfo/internet-history
> >     >>> --
> >     >>> Internet-history mailing list
> >     >>> Internet-history at elists.isoc.org
> >     >>> https://elists.isoc.org/mailman/listinfo/internet-history
> >
> >     --
> >     Internet-history mailing list
> >     Internet-history at elists.isoc.org
> >     https://elists.isoc.org/mailman/listinfo/internet-history
> >
> >
> >
> > --
> > Sent by a Verified
> > Sent by a Verified sender
> > <https://wallet.unumid.co/authenticate?referralCode=tcp16fM4W47y>
> > sender
>
> --
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history
>