[ih] early networking: "the solution"

Mon Apr 22 06:43:42 PDT 2024

Vint,

A memory error caused the upper bit of a routing update sequence number to
flip, making the sequence number "greater than" the previously circulating
sequence number by exactly half the sequence number space. Routing updates
half-way through the sequence space kept chasing each other. We fixed it by
instituting a window on the sequence space so that only updates "greater
than" the previous update but within the window were accepted.

Cheers,
Andy

On Mon, Apr 22, 2024 at 3:01 AM vinton cerf via Internet-history <
internet-history at elists.isoc.org> wrote:

> On Mon, Apr 22, 2024 at 2:07 AM Jack Haverty via Internet-history <
> internet-history at elists.isoc.org> wrote:
>
> > Steve,
> >
> > You were right, and checksum issues did cause troubles in the Arpanet.
> > The IMPs *did* have errors.  The details have become fuzzy, but IIRC
> > there was a routing failure at one point that took down the entire
> > Arpanet.   The cause was traced to a bad memory in some IMP that was
> > corrupting packets if they happened to use that memory.   Some of the
> > packets were internal packets disseminating routing information ... and
> > the bad data resulted in the net locking up in perpetual routing
> > confusion.   Checksums caught errors on circuits, but not errors inside
> > the IMP memory.
> >
> there were at least two cases. In one, all the bits of the distance vector
> were zeroed making it look like the IMP at Harvard (?) was zero hops from
> all other IMPs. In a second case, a flaky memory bit caused a routing
> packet to be sent repeatedly everywhere in the network as each successive
> packet looked like an "update" and was sent on all circuits between the
> IMPs. At least, that's my hazy recollection. The memory of the IMPs was not
> error checked in the way current day computers do.
>
> >
> > Checksums were useful as debugging tools in Internet operation just like
> > in the Arpanet.  When we got the task of making the core Internet into a
> > 24x7 we took the easiest route and applied the same techniques that had
> > been developed for the Arpanet NOC.   One of those techniques was
> > "traps", which were essentially error reports from remote switches to
> > NOC operators.   So the core gateways quickly acquired the ability to
> > report errors back to our NOC, just like IMPs had been doing for about a
> > decade.
> >
> > Mike Brescia was one of the "Internet gang" and he with Bob Hinden and
> > Alan Sheltzer watched over the neonatal Internet core to keep it as
> > close as possible to a 24x7 service.  One day Mike noticed that a
> > particular router was reporting lots of checksum errors.  He
> > investigated and saw that a new host was trying to come up on the
> > Internet and apparently someone was debugging their TCP.  The checksum
> > reports revealed the problem -- IIRC, the 4 bytes of IP addresses were
> > misordered.   That was easy to do with 16-bit CPUs holding 2 8-bit bytes
> > in each word.
> >
> > So Mike looked up the host information at the NIC, found the email
> > address of the likely responsible human, and sent an email, something
> > like "FYI, you need to swap the bytes in your IP addresses".   A short
> > while later he got back an answer, something like "Hey, thanks!  That
> > was it."   Not long after that he got another email -- "How the &^&^%
> > did you know that???"   The TCP developer somewhere in the midwest IIRC
> > had realized that someone in Boston had been looking over their shoulder
> > from a thousand miles away.
> >
> > Remote debugging.  Checksums were potent debugging tools.  Started in
> > the Arpanet (I think), and we just moved it into the Internet.
> >
> > Fun times.
> > Jack Haverty
> >
> > On 4/21/24 15:56, Steve Crocker wrote:
> > > There was a bit of checksum history earlier.  In our early thinking
> > > about protocols for the Arpanet, Jeff Rulifson pointed out the utility
> > > of checksums to detect possible errors in implementations.  By "early
> > > thinking" I mean the period between August 1968 and February 1969.
> > > This was well before BBN issues report 1822 with the details of the
> > > message format, etc.  We knew the IMPs would accept messages of
> > > roughly 8000 bits, break them into packets of roughly 1000 bits, and
> > > then reassemble them at the receiving IMP.  We also knew the IMPs
> > > would be using very strong checksums to detect transmission errors
> > > between the IMPs.
> > >
> > > We decided to use a 16 bit checksum that was simply the ones
> > > complement sum of the message, with one added wrinkle of rotating the
> > > sum by one bit every thousand bits.  We included this wrinkle to catch
> > > the possible error of misordering the packets during reassembly.
> > >
> > > On 14 Feb 1969, a few of us in the Network Working Group met with the
> > > IMP team at BBN for the first time.  When we described our thinking
> > > about a checksum, Frank Heart hit the roof.  "You'll make my network
> > > look slow!" his voice reaching his trademarked high pitch when he was
> > > exercised.  He pointed out they had very strong checksums for the
> > > transmissions between the IMPs.
> > >
> > > I tried to counter him.  "What about the path between the host and the
> > > IMP?" I asked.  "As reliable as your accumulator," he roared.  (For
> > > those young enough not to understand what this referred to, in those
> > > days the central processing unit of a computer consisted of separate
> > > components.  The accumulator was a separate piece of hardware that
> > > held one word of data. It was involved in almost every instruction, so
> > > if it broke, the computer was broken.)
> > >
> > > To my everlasting embarrassment, I yielded.  We didn't challenge
> > > whether the IMPs might ever have an error, and we didn't insist that
> > > it wouldn't really cost very much to have a lightweight checksum.  We
> > > dropped the idea of including a checksum.
> > >
> > > Unlike the Arpanet, the Internet included a wide variety of computing
> > > and transmission environments, so the need for checksums was far more
> > > evident.  But we didn't have to wait that long.  When Lincoln Lab
> > > connected the TX-2 to the IMP 10 sometime in 1970 or early 1971, they
> > > had intermittent errors that took a while to track down.  It turned
> > > out when their drum was operating, there was hardware interference
> > > with their IMP interface.  A simple checksum might have helped narrow
> > > down where to look ;)
> > >
> > > Steve
> > >
> > >
> > > On Sun, Apr 21, 2024 at 6:14 PM Jack Haverty via Internet-history
> > > <internet-history at elists.isoc.org> wrote:
> > >
> > >     Probably not many people know the story behind the IP checksum.   I
> > >     don't think anyone's ever written it down.  While I still
> > remember...:
> > >
> > >     The checksum algorithm was selected not for its capabilities to
> catch
> > >     errors, but rather for its simplicity for our overworked and
> > >     inadequate
> > >     computing power.  There was significant concern at the time,
> > >     especially
> > >     in the sites running the big host computers, about the use of
> scarce
> > >     computing power as "overhead" involved in using the network. See
> for
> > >     example: https://www.rfc-editor.org/rfc//rfc425
> > >
> > >     Besides, at the time all TCP traffic was through the Arpanet, and
> the
> > >     IMPs did their own checksums so any circuit problems would be
> caught
> > >     there.  So as we were defining the details of the new TCP4
> > >     mechanisms,
> > >     the checksum algorithm was kept intentionally simple, to be
> > >     replaced in
> > >     some future version of TCP when computers would be more capable
> > >     and the
> > >     error characteristics of pathways through the Internet were better
> > >     understood by experience.   The checksum algorithm was a
> > >     placeholder for
> > >     a future improved version, like many other mechanisms of TCP/IP4.
> > >
> > >     The actual details of the checksum computation were nailed down on
> > >     January 27, 1979.  That was the date of the first TCP Bakeoff,
> > >     organized
> > >     by Jon Postel.   I think of it as possibly the first ever
> > "Hackathon".
> > >
> > >     The group of TCP implementers assembled on a weekend at USC-ISI and
> > >     commandeered a bunch of offices with terminals that we could use to
> > >     connect to our computers back home.   At first, we could all talk
> to
> > >     ourselves fine.   However, no one could talk to any other
> > >     implementation.  Everybody was getting checksum errors.
> > >
> > >     Since we could all hear each other, a discussion quickly reach a
> > >     consensus.   We turned off the checksum verification code in all
> > >     of our
> > >     implementations, so our TCPs would simply assume every incoming
> > >     message/packet/datagram/segment (you pick your favorite term...)
> was
> > >     error-free.
> > >
> > >     It seems strange now, but computing in the 1970s was a lot
> different
> > >     from today.  In addition to the scarcity of CPU power and memory,
> > >     there
> > >     was little consensus about how bits were used inside of each
> > >     computer,
> > >     and how they were transferred onto wires by network interface
> > >     hardware.
> > >     Computers didn't agree on the number of bits in a byte, or how
> bytes
> > >     were ordered into computer words, how arithmetic calculations were
> > >     performed, or how to take the bits in and out of your computer's
> > >     memory
> > >     and transfer them serially over an I/O interface.  If you think the
> > >     confusion of today's USB connectors is bad, it was much worse 50
> > >     years ago!
> > >
> > >     Danny Cohen later published a great "plea for peace" that reveals
> > >     some
> > >     of the confusion - see https://www.rfc-editor.org/ien/ien137.txt
> > >
> > >     So it wasn't a surprise that each TCP implementer had somehow
> > >     failed in
> > >     translating the specification, simple as it was, into code.
> > >
> > >     The disabling of checksums enabled us to debug all this and slowly
> > >     (took
> > >     two days IIRC) got implementations to talk to other
> implementations.
> > >     Then we re-enabled checksumming and tried all the tests again.
> TCP4
> > >     worked!  Jon Postel took on the task of figuring out how the now
> > >     working
> > >     checksums actually were doing the computations and revised the
> > >     specifications accordingly.   Rough consensus and running code had
> > >     failed; instead we had running code and then rough consensus.
> > >
> > >     My most memorable recollection of that weekend was late on Sunday.
> > >     Jon
> > >     had set up the Bakeoff with a "scoring scheme" which gave each
> > >     participant a number of points for passing each test.   His score
> > >     rules
> > >     are here:
> > >
> >
> https://drive.google.com/file/d/1NNc9tJTEQsVq-knCCWLeJ3zVrL2Xd25g/view?usp=sharing
> > >
> > >     We were all getting tired, and Bill Plummer (Tenex TCP) shouted
> > >     down the
> > >     hall to Dave Clark (Multics TCP) -- "Hey Dave, can you turn off
> your
> > >     checksumming again?"  Dave replied "OK, it's off".  Bill hit a key
> on
> > >     his terminal.  Dave yelled "Hey, Multics just crashed!"  Bill
> gloated
> > >     "KO! Ten points for me!"
> > >
> > >     Such was how checksumming made it into TCP/IP4.
> > >
> > >     Jack Haverty
> > >
> > >
> > >
> > >     On 4/21/24 12:27, John Day via Internet-history wrote:
> > >     > So I wasn’t dreaming!  ;-)
> > >     >
> > >     > CRCs also have problems in HDLC if there are a lot of 1s in the
> > >     data.  (The bit stuffing is not included in the checksum
> > calculation.)
> > >     >
> > >     >> On Apr 21, 2024, at 15:22,touch at strayalpha.com wrote:
> > >     >>
> > >     >> I think it was this one:
> > >     >> http://ccr.sigcomm.org/archive/1995/conf/partridge.pdf
> > >     >>
> > >     >> Joe
> > >     >>
> > >     >> —
> > >     >> Dr. Joe Touch, temporal epistemologist
> > >     >> www.strayalpha.com <http://www.strayalpha.com>
> > >     >>
> > >     >>> On Apr 21, 2024, at 12:20 PM, Scott Bradner via
> > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > >     >>>
> > >     >>> maybe in conjunction with the Pac Bell NAP
> > >     >>>
> > >     >>> https://www.cnet.com/tech/mobile/pac-bell-adds-network-access/
> > >     >>>
> > >     >>>
> https://mailman.nanog.org/pipermail/nanog/1998-March/127113.html
> > >     >>>
> > >     >>> Scott
> > >     >>>
> > >     >>>> On Apr 21, 2024, at 3:00 PM, John Day<jeanjour at comcast.net>
> > >     wrote:
> > >     >>>>
> > >     >>>> I have a vague recollection of a paper (possibly by Craig
> > >     Partridge) that talked about ATM dropping cells (and possibly
> > >     other different forms of errors) and how IP and other protocols
> > >     were not built to detect such losses.
> > >     >>>>
> > >     >>>> Am I dreaming?
> > >     >>>>
> > >     >>>> John
> > >     >>>>
> > >     >>>>> On Apr 21, 2024, at 09:10, Scott Bradner via
> > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > >     >>>>>
> > >     >>>>> yes but...
> > >     >>>>>
> > >     >>>>> the ATM Forum people felt that ATM should replace TCP and
> > >     most of IP
> > >     >>>>> i.e. become the new IP and that new applications should
> > >     assume they were
> > >     >>>>> running over ATM and directly make use of ATM features
> > >     (e.g., ABR)
> > >     >>>>>
> > >     >>>>> ATM as yet another wire was just fine (though a bit choppy)
> > >     >>>>>
> > >     >>>>> Scott
> > >     >>>>>
> > >     >>>>>
> > >     >>>>>
> > >     >>>>>> On Apr 21, 2024, at 9:02 AM, Andrew G.
> > >     Malis<agmalis at gmail.com> wrote:
> > >     >>>>>>
> > >     >>>>>> Scott,
> > >     >>>>>>
> > >     >>>>>> ATM could carry any protocol that you could carry over
> > >     Ethernet, see RFCs 2225, 2492, and 2684.
> > >     >>>>>>
> > >     >>>>>> Cheers,
> > >     >>>>>> Andy
> > >     >>>>>>
> > >     >>>>>>
> > >     >>>>>> On Sat, Apr 20, 2024 at 8:15 PM Scott Bradner via
> > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > >     >>>>>>
> > >     >>>>>>
> > >     >>>>>>> On Apr 20, 2024, at 8:11 PM, John Gilmore via
> > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > >     >>>>>>>
> > >     >>>>>>> John Day via
> > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > >     >>>>>>>> In the early 70s, people were trying to figure out how to
> > >     interwork multiple networks of different technologies. What was
> > >     the solution that was arrived at that led to the current Internet?
> > >     >>>>>>>> I conjectured yesterday that the fundamental solution
> > >     must have been in hand by the time Cerf and Kahn published their
> > >     paper.
> > >     >>>>>>>> Are you conjecturing that the solution was gateways? and
> > >     hence protocol translation at the gateways?
> > >     >>>>>>> Maybe it's too obvious in retrospect.  But the "solution"
> > >     that I see was
> > >     >>>>>>> that everyone had to move to using a protocol that was
> > >     independent of
> > >     >>>>>>> their physical medium.
> > >     >>>>>> and ATM was an example of the reverse - it was a protocol &
> > >     a network - OK
> > >     >>>>>> as long as you did not build applications that knew they
> > >     were running over ATM
> > >     >>>>>> (or if ATM had been the last networking protocol)
> > >     >>>>>>
> > >     >>>>>> Scott
> > >     >>>>>> --
> > >     >>>>>> Internet-history mailing list
> > >     >>>>>> Internet-history at elists.isoc.org
> > >     >>>>>> https://elists.isoc.org/mailman/listinfo/internet-history
> > >     >>>>> --
> > >     >>>>> Internet-history mailing list
> > >     >>>>> Internet-history at elists.isoc.org
> > >     >>>>> https://elists.isoc.org/mailman/listinfo/internet-history
> > >     >>> --
> > >     >>> Internet-history mailing list
> > >     >>> Internet-history at elists.isoc.org
> > >     >>> https://elists.isoc.org/mailman/listinfo/internet-history
> > >
> > >     --
> > >     Internet-history mailing list
> > >     Internet-history at elists.isoc.org
> > >     https://elists.isoc.org/mailman/listinfo/internet-history
> > >
> > >
> > >
> > > --
> > > Sent by a Verified
> > > Sent by a Verified sender
> > > <https://wallet.unumid.co/authenticate?referralCode=tcp16fM4W47y>
> > > sender
> >
> > --
> > Internet-history mailing list
> > Internet-history at elists.isoc.org
> > https://elists.isoc.org/mailman/listinfo/internet-history
> >
> --
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history
>