[ih] early networking: "the solution"

Vint Cerf vint at google.com
Mon Apr 22 06:50:34 PDT 2024


thanks for that background, Andy!
v


On Mon, Apr 22, 2024 at 9:44 AM Andrew G. Malis via Internet-history <
internet-history at elists.isoc.org> wrote:

> Vint,
>
> A memory error caused the upper bit of a routing update sequence number to
> flip, making the sequence number "greater than" the previously circulating
> sequence number by exactly half the sequence number space. Routing updates
> half-way through the sequence space kept chasing each other. We fixed it by
> instituting a window on the sequence space so that only updates "greater
> than" the previous update but within the window were accepted.
>
> Cheers,
> Andy
>
>
> On Mon, Apr 22, 2024 at 3:01 AM vinton cerf via Internet-history <
> internet-history at elists.isoc.org> wrote:
>
> > On Mon, Apr 22, 2024 at 2:07 AM Jack Haverty via Internet-history <
> > internet-history at elists.isoc.org> wrote:
> >
> > > Steve,
> > >
> > > You were right, and checksum issues did cause troubles in the Arpanet.
> > > The IMPs *did* have errors.  The details have become fuzzy, but IIRC
> > > there was a routing failure at one point that took down the entire
> > > Arpanet.   The cause was traced to a bad memory in some IMP that was
> > > corrupting packets if they happened to use that memory.   Some of the
> > > packets were internal packets disseminating routing information ... and
> > > the bad data resulted in the net locking up in perpetual routing
> > > confusion.   Checksums caught errors on circuits, but not errors inside
> > > the IMP memory.
> > >
> > there were at least two cases. In one, all the bits of the distance
> vector
> > were zeroed making it look like the IMP at Harvard (?) was zero hops from
> > all other IMPs. In a second case, a flaky memory bit caused a routing
> > packet to be sent repeatedly everywhere in the network as each successive
> > packet looked like an "update" and was sent on all circuits between the
> > IMPs. At least, that's my hazy recollection. The memory of the IMPs was
> not
> > error checked in the way current day computers do.
> >
> > >
> > > Checksums were useful as debugging tools in Internet operation just
> like
> > > in the Arpanet.  When we got the task of making the core Internet into
> a
> > > 24x7 we took the easiest route and applied the same techniques that had
> > > been developed for the Arpanet NOC.   One of those techniques was
> > > "traps", which were essentially error reports from remote switches to
> > > NOC operators.   So the core gateways quickly acquired the ability to
> > > report errors back to our NOC, just like IMPs had been doing for about
> a
> > > decade.
> > >
> > > Mike Brescia was one of the "Internet gang" and he with Bob Hinden and
> > > Alan Sheltzer watched over the neonatal Internet core to keep it as
> > > close as possible to a 24x7 service.  One day Mike noticed that a
> > > particular router was reporting lots of checksum errors.  He
> > > investigated and saw that a new host was trying to come up on the
> > > Internet and apparently someone was debugging their TCP.  The checksum
> > > reports revealed the problem -- IIRC, the 4 bytes of IP addresses were
> > > misordered.   That was easy to do with 16-bit CPUs holding 2 8-bit
> bytes
> > > in each word.
> > >
> > > So Mike looked up the host information at the NIC, found the email
> > > address of the likely responsible human, and sent an email, something
> > > like "FYI, you need to swap the bytes in your IP addresses".   A short
> > > while later he got back an answer, something like "Hey, thanks!  That
> > > was it."   Not long after that he got another email -- "How the &^&^%
> > > did you know that???"   The TCP developer somewhere in the midwest IIRC
> > > had realized that someone in Boston had been looking over their
> shoulder
> > > from a thousand miles away.
> > >
> > > Remote debugging.  Checksums were potent debugging tools.  Started in
> > > the Arpanet (I think), and we just moved it into the Internet.
> > >
> > > Fun times.
> > > Jack Haverty
> > >
> > > On 4/21/24 15:56, Steve Crocker wrote:
> > > > There was a bit of checksum history earlier.  In our early thinking
> > > > about protocols for the Arpanet, Jeff Rulifson pointed out the
> utility
> > > > of checksums to detect possible errors in implementations.  By "early
> > > > thinking" I mean the period between August 1968 and February 1969.
> > > > This was well before BBN issues report 1822 with the details of the
> > > > message format, etc.  We knew the IMPs would accept messages of
> > > > roughly 8000 bits, break them into packets of roughly 1000 bits, and
> > > > then reassemble them at the receiving IMP.  We also knew the IMPs
> > > > would be using very strong checksums to detect transmission errors
> > > > between the IMPs.
> > > >
> > > > We decided to use a 16 bit checksum that was simply the ones
> > > > complement sum of the message, with one added wrinkle of rotating the
> > > > sum by one bit every thousand bits.  We included this wrinkle to
> catch
> > > > the possible error of misordering the packets during reassembly.
> > > >
> > > > On 14 Feb 1969, a few of us in the Network Working Group met with the
> > > > IMP team at BBN for the first time.  When we described our thinking
> > > > about a checksum, Frank Heart hit the roof.  "You'll make my network
> > > > look slow!" his voice reaching his trademarked high pitch when he was
> > > > exercised.  He pointed out they had very strong checksums for the
> > > > transmissions between the IMPs.
> > > >
> > > > I tried to counter him.  "What about the path between the host and
> the
> > > > IMP?" I asked.  "As reliable as your accumulator," he roared.  (For
> > > > those young enough not to understand what this referred to, in those
> > > > days the central processing unit of a computer consisted of separate
> > > > components.  The accumulator was a separate piece of hardware that
> > > > held one word of data. It was involved in almost every instruction,
> so
> > > > if it broke, the computer was broken.)
> > > >
> > > > To my everlasting embarrassment, I yielded.  We didn't challenge
> > > > whether the IMPs might ever have an error, and we didn't insist that
> > > > it wouldn't really cost very much to have a lightweight checksum.  We
> > > > dropped the idea of including a checksum.
> > > >
> > > > Unlike the Arpanet, the Internet included a wide variety of computing
> > > > and transmission environments, so the need for checksums was far more
> > > > evident.  But we didn't have to wait that long.  When Lincoln Lab
> > > > connected the TX-2 to the IMP 10 sometime in 1970 or early 1971, they
> > > > had intermittent errors that took a while to track down.  It turned
> > > > out when their drum was operating, there was hardware interference
> > > > with their IMP interface.  A simple checksum might have helped narrow
> > > > down where to look ;)
> > > >
> > > > Steve
> > > >
> > > >
> > > > On Sun, Apr 21, 2024 at 6:14 PM Jack Haverty via Internet-history
> > > > <internet-history at elists.isoc.org> wrote:
> > > >
> > > >     Probably not many people know the story behind the IP checksum.
>  I
> > > >     don't think anyone's ever written it down.  While I still
> > > remember...:
> > > >
> > > >     The checksum algorithm was selected not for its capabilities to
> > catch
> > > >     errors, but rather for its simplicity for our overworked and
> > > >     inadequate
> > > >     computing power.  There was significant concern at the time,
> > > >     especially
> > > >     in the sites running the big host computers, about the use of
> > scarce
> > > >     computing power as "overhead" involved in using the network. See
> > for
> > > >     example: https://www.rfc-editor.org/rfc//rfc425
> > > >
> > > >     Besides, at the time all TCP traffic was through the Arpanet, and
> > the
> > > >     IMPs did their own checksums so any circuit problems would be
> > caught
> > > >     there.  So as we were defining the details of the new TCP4
> > > >     mechanisms,
> > > >     the checksum algorithm was kept intentionally simple, to be
> > > >     replaced in
> > > >     some future version of TCP when computers would be more capable
> > > >     and the
> > > >     error characteristics of pathways through the Internet were
> better
> > > >     understood by experience.   The checksum algorithm was a
> > > >     placeholder for
> > > >     a future improved version, like many other mechanisms of TCP/IP4.
> > > >
> > > >     The actual details of the checksum computation were nailed down
> on
> > > >     January 27, 1979.  That was the date of the first TCP Bakeoff,
> > > >     organized
> > > >     by Jon Postel.   I think of it as possibly the first ever
> > > "Hackathon".
> > > >
> > > >     The group of TCP implementers assembled on a weekend at USC-ISI
> and
> > > >     commandeered a bunch of offices with terminals that we could use
> to
> > > >     connect to our computers back home.   At first, we could all talk
> > to
> > > >     ourselves fine.   However, no one could talk to any other
> > > >     implementation.  Everybody was getting checksum errors.
> > > >
> > > >     Since we could all hear each other, a discussion quickly reach a
> > > >     consensus.   We turned off the checksum verification code in all
> > > >     of our
> > > >     implementations, so our TCPs would simply assume every incoming
> > > >     message/packet/datagram/segment (you pick your favorite term...)
> > was
> > > >     error-free.
> > > >
> > > >     It seems strange now, but computing in the 1970s was a lot
> > different
> > > >     from today.  In addition to the scarcity of CPU power and memory,
> > > >     there
> > > >     was little consensus about how bits were used inside of each
> > > >     computer,
> > > >     and how they were transferred onto wires by network interface
> > > >     hardware.
> > > >     Computers didn't agree on the number of bits in a byte, or how
> > bytes
> > > >     were ordered into computer words, how arithmetic calculations
> were
> > > >     performed, or how to take the bits in and out of your computer's
> > > >     memory
> > > >     and transfer them serially over an I/O interface.  If you think
> the
> > > >     confusion of today's USB connectors is bad, it was much worse 50
> > > >     years ago!
> > > >
> > > >     Danny Cohen later published a great "plea for peace" that reveals
> > > >     some
> > > >     of the confusion - see https://www.rfc-editor.org/ien/ien137.txt
> > > >
> > > >     So it wasn't a surprise that each TCP implementer had somehow
> > > >     failed in
> > > >     translating the specification, simple as it was, into code.
> > > >
> > > >     The disabling of checksums enabled us to debug all this and
> slowly
> > > >     (took
> > > >     two days IIRC) got implementations to talk to other
> > implementations.
> > > >     Then we re-enabled checksumming and tried all the tests again.
> > TCP4
> > > >     worked!  Jon Postel took on the task of figuring out how the now
> > > >     working
> > > >     checksums actually were doing the computations and revised the
> > > >     specifications accordingly.   Rough consensus and running code
> had
> > > >     failed; instead we had running code and then rough consensus.
> > > >
> > > >     My most memorable recollection of that weekend was late on
> Sunday.
> > > >     Jon
> > > >     had set up the Bakeoff with a "scoring scheme" which gave each
> > > >     participant a number of points for passing each test.   His score
> > > >     rules
> > > >     are here:
> > > >
> > >
> >
> https://drive.google.com/file/d/1NNc9tJTEQsVq-knCCWLeJ3zVrL2Xd25g/view?usp=sharing
> > > >
> > > >     We were all getting tired, and Bill Plummer (Tenex TCP) shouted
> > > >     down the
> > > >     hall to Dave Clark (Multics TCP) -- "Hey Dave, can you turn off
> > your
> > > >     checksumming again?"  Dave replied "OK, it's off".  Bill hit a
> key
> > on
> > > >     his terminal.  Dave yelled "Hey, Multics just crashed!"  Bill
> > gloated
> > > >     "KO! Ten points for me!"
> > > >
> > > >     Such was how checksumming made it into TCP/IP4.
> > > >
> > > >     Jack Haverty
> > > >
> > > >
> > > >
> > > >     On 4/21/24 12:27, John Day via Internet-history wrote:
> > > >     > So I wasn’t dreaming!  ;-)
> > > >     >
> > > >     > CRCs also have problems in HDLC if there are a lot of 1s in the
> > > >     data.  (The bit stuffing is not included in the checksum
> > > calculation.)
> > > >     >
> > > >     >> On Apr 21, 2024, at 15:22,touch at strayalpha.com wrote:
> > > >     >>
> > > >     >> I think it was this one:
> > > >     >> http://ccr.sigcomm.org/archive/1995/conf/partridge.pdf
> > > >     >>
> > > >     >> Joe
> > > >     >>
> > > >     >> —
> > > >     >> Dr. Joe Touch, temporal epistemologist
> > > >     >> www.strayalpha.com <http://www.strayalpha.com>
> > > >     >>
> > > >     >>> On Apr 21, 2024, at 12:20 PM, Scott Bradner via
> > > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > > >     >>>
> > > >     >>> maybe in conjunction with the Pac Bell NAP
> > > >     >>>
> > > >     >>>
> https://www.cnet.com/tech/mobile/pac-bell-adds-network-access/
> > > >     >>>
> > > >     >>>
> > https://mailman.nanog.org/pipermail/nanog/1998-March/127113.html
> > > >     >>>
> > > >     >>> Scott
> > > >     >>>
> > > >     >>>> On Apr 21, 2024, at 3:00 PM, John Day<jeanjour at comcast.net>
> > > >     wrote:
> > > >     >>>>
> > > >     >>>> I have a vague recollection of a paper (possibly by Craig
> > > >     Partridge) that talked about ATM dropping cells (and possibly
> > > >     other different forms of errors) and how IP and other protocols
> > > >     were not built to detect such losses.
> > > >     >>>>
> > > >     >>>> Am I dreaming?
> > > >     >>>>
> > > >     >>>> John
> > > >     >>>>
> > > >     >>>>> On Apr 21, 2024, at 09:10, Scott Bradner via
> > > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > > >     >>>>>
> > > >     >>>>> yes but...
> > > >     >>>>>
> > > >     >>>>> the ATM Forum people felt that ATM should replace TCP and
> > > >     most of IP
> > > >     >>>>> i.e. become the new IP and that new applications should
> > > >     assume they were
> > > >     >>>>> running over ATM and directly make use of ATM features
> > > >     (e.g., ABR)
> > > >     >>>>>
> > > >     >>>>> ATM as yet another wire was just fine (though a bit choppy)
> > > >     >>>>>
> > > >     >>>>> Scott
> > > >     >>>>>
> > > >     >>>>>
> > > >     >>>>>
> > > >     >>>>>> On Apr 21, 2024, at 9:02 AM, Andrew G.
> > > >     Malis<agmalis at gmail.com> wrote:
> > > >     >>>>>>
> > > >     >>>>>> Scott,
> > > >     >>>>>>
> > > >     >>>>>> ATM could carry any protocol that you could carry over
> > > >     Ethernet, see RFCs 2225, 2492, and 2684.
> > > >     >>>>>>
> > > >     >>>>>> Cheers,
> > > >     >>>>>> Andy
> > > >     >>>>>>
> > > >     >>>>>>
> > > >     >>>>>> On Sat, Apr 20, 2024 at 8:15 PM Scott Bradner via
> > > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > > >     >>>>>>
> > > >     >>>>>>
> > > >     >>>>>>> On Apr 20, 2024, at 8:11 PM, John Gilmore via
> > > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > > >     >>>>>>>
> > > >     >>>>>>> John Day via
> > > >     Internet-history<internet-history at elists.isoc.org> wrote:
> > > >     >>>>>>>> In the early 70s, people were trying to figure out how
> to
> > > >     interwork multiple networks of different technologies. What was
> > > >     the solution that was arrived at that led to the current
> Internet?
> > > >     >>>>>>>> I conjectured yesterday that the fundamental solution
> > > >     must have been in hand by the time Cerf and Kahn published their
> > > >     paper.
> > > >     >>>>>>>> Are you conjecturing that the solution was gateways? and
> > > >     hence protocol translation at the gateways?
> > > >     >>>>>>> Maybe it's too obvious in retrospect.  But the "solution"
> > > >     that I see was
> > > >     >>>>>>> that everyone had to move to using a protocol that was
> > > >     independent of
> > > >     >>>>>>> their physical medium.
> > > >     >>>>>> and ATM was an example of the reverse - it was a protocol
> &
> > > >     a network - OK
> > > >     >>>>>> as long as you did not build applications that knew they
> > > >     were running over ATM
> > > >     >>>>>> (or if ATM had been the last networking protocol)
> > > >     >>>>>>
> > > >     >>>>>> Scott
> > > >     >>>>>> --
> > > >     >>>>>> Internet-history mailing list
> > > >     >>>>>> Internet-history at elists.isoc.org
> > > >     >>>>>> https://elists.isoc.org/mailman/listinfo/internet-history
> > > >     >>>>> --
> > > >     >>>>> Internet-history mailing list
> > > >     >>>>> Internet-history at elists.isoc.org
> > > >     >>>>> https://elists.isoc.org/mailman/listinfo/internet-history
> > > >     >>> --
> > > >     >>> Internet-history mailing list
> > > >     >>> Internet-history at elists.isoc.org
> > > >     >>> https://elists.isoc.org/mailman/listinfo/internet-history
> > > >
> > > >     --
> > > >     Internet-history mailing list
> > > >     Internet-history at elists.isoc.org
> > > >     https://elists.isoc.org/mailman/listinfo/internet-history
> > > >
> > > >
> > > >
> > > > --
> > > > Sent by a Verified
> > > > Sent by a Verified sender
> > > > <https://wallet.unumid.co/authenticate?referralCode=tcp16fM4W47y>
> > > > sender
> > >
> > > --
> > > Internet-history mailing list
> > > Internet-history at elists.isoc.org
> > > https://elists.isoc.org/mailman/listinfo/internet-history
> > >
> > --
> > Internet-history mailing list
> > Internet-history at elists.isoc.org
> > https://elists.isoc.org/mailman/listinfo/internet-history
> >
> --
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history
>


-- 
Please send any postal/overnight deliveries to:
Vint Cerf
Google, LLC
1900 Reston Metro Plaza, 16th Floor
Reston, VA 20190
+1 (571) 213 1346


until further notice
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4006 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20240422/c6bcb58d/attachment-0001.p7s>


More information about the Internet-history mailing list