[ih] early networking: "the solution"
Andrew G. Malis
agmalis at gmail.com
Mon Apr 22 06:43:42 PDT 2024
Vint,
A memory error caused the upper bit of a routing update sequence number to
flip, making the sequence number "greater than" the previously circulating
sequence number by exactly half the sequence number space. Routing updates
half-way through the sequence space kept chasing each other. We fixed it by
instituting a window on the sequence space so that only updates "greater
than" the previous update but within the window were accepted.
Cheers,
Andy
On Mon, Apr 22, 2024 at 3:01 AM vinton cerf via Internet-history <
internet-history at elists.isoc.org> wrote:
> On Mon, Apr 22, 2024 at 2:07 AM Jack Haverty via Internet-history <
> internet-history at elists.isoc.org> wrote:
>
> > Steve,
> >
> > You were right, and checksum issues did cause troubles in the Arpanet.
> > The IMPs *did* have errors. The details have become fuzzy, but IIRC
> > there was a routing failure at one point that took down the entire
> > Arpanet. The cause was traced to a bad memory in some IMP that was
> > corrupting packets if they happened to use that memory. Some of the
> > packets were internal packets disseminating routing information ... and
> > the bad data resulted in the net locking up in perpetual routing
> > confusion. Checksums caught errors on circuits, but not errors inside
> > the IMP memory.
> >
> there were at least two cases. In one, all the bits of the distance vector
> were zeroed making it look like the IMP at Harvard (?) was zero hops from
> all other IMPs. In a second case, a flaky memory bit caused a routing
> packet to be sent repeatedly everywhere in the network as each successive
> packet looked like an "update" and was sent on all circuits between the
> IMPs. At least, that's my hazy recollection. The memory of the IMPs was not
> error checked in the way current day computers do.
>
> >
> > Checksums were useful as debugging tools in Internet operation just like
> > in the Arpanet. When we got the task of making the core Internet into a
> > 24x7 we took the easiest route and applied the same techniques that had
> > been developed for the Arpanet NOC. One of those techniques was
> > "traps", which were essentially error reports from remote switches to
> > NOC operators. So the core gateways quickly acquired the ability to
> > report errors back to our NOC, just like IMPs had been doing for about a
> > decade.
> >
> > Mike Brescia was one of the "Internet gang" and he with Bob Hinden and
> > Alan Sheltzer watched over the neonatal Internet core to keep it as
> > close as possible to a 24x7 service. One day Mike noticed that a
> > particular router was reporting lots of checksum errors. He
> > investigated and saw that a new host was trying to come up on the
> > Internet and apparently someone was debugging their TCP. The checksum
> > reports revealed the problem -- IIRC, the 4 bytes of IP addresses were
> > misordered. That was easy to do with 16-bit CPUs holding 2 8-bit bytes
> > in each word.
> >
> > So Mike looked up the host information at the NIC, found the email
> > address of the likely responsible human, and sent an email, something
> > like "FYI, you need to swap the bytes in your IP addresses". A short
> > while later he got back an answer, something like "Hey, thanks! That
> > was it." Not long after that he got another email -- "How the &^&^%
> > did you know that???" The TCP developer somewhere in the midwest IIRC
> > had realized that someone in Boston had been looking over their shoulder
> > from a thousand miles away.
> >
> > Remote debugging. Checksums were potent debugging tools. Started in
> > the Arpanet (I think), and we just moved it into the Internet.
> >
> > Fun times.
> > Jack Haverty
> >
> > On 4/21/24 15:56, Steve Crocker wrote:
> > > There was a bit of checksum history earlier. In our early thinking
> > > about protocols for the Arpanet, Jeff Rulifson pointed out the utility
> > > of checksums to detect possible errors in implementations. By "early
> > > thinking" I mean the period between August 1968 and February 1969.
> > > This was well before BBN issues report 1822 with the details of the
> > > message format, etc. We knew the IMPs would accept messages of
> > > roughly 8000 bits, break them into packets of roughly 1000 bits, and
> > > then reassemble them at the receiving IMP. We also knew the IMPs
> > > would be using very strong checksums to detect transmission errors
> > > between the IMPs.
> > >
> > > We decided to use a 16 bit checksum that was simply the ones
> > > complement sum of the message, with one added wrinkle of rotating the
> > > sum by one bit every thousand bits. We included this wrinkle to catch
> > > the possible error of misordering the packets during reassembly.
> > >
> > > On 14 Feb 1969, a few of us in the Network Working Group met with the
> > > IMP team at BBN for the first time. When we described our thinking
> > > about a checksum, Frank Heart hit the roof. "You'll make my network
> > > look slow!" his voice reaching his trademarked high pitch when he was
> > > exercised. He pointed out they had very strong checksums for the
> > > transmissions between the IMPs.
> > >
> > > I tried to counter him. "What about the path between the host and the
> > > IMP?" I asked. "As reliable as your accumulator," he roared. (For
> > > those young enough not to understand what this referred to, in those
> > > days the central processing unit of a computer consisted of separate
> > > components. The accumulator was a separate piece of hardware that
> > > held one word of data. It was involved in almost every instruction, so
> > > if it broke, the computer was broken.)
> > >
> > > To my everlasting embarrassment, I yielded. We didn't challenge
> > > whether the IMPs might ever have an error, and we didn't insist that
> > > it wouldn't really cost very much to have a lightweight checksum. We
> > > dropped the idea of including a checksum.
> > >
> > > Unlike the Arpanet, the Internet included a wide variety of computing
> > > and transmission environments, so the need for checksums was far more
> > > evident. But we didn't have to wait that long. When Lincoln Lab
> > > connected the TX-2 to the IMP 10 sometime in 1970 or early 1971, they
> > > had intermittent errors that took a while to track down. It turned
> > > out when their drum was operating, there was hardware interference
> > > with their IMP interface. A simple checksum might have helped narrow
> > > down where to look ;)
> > >
> > > Steve
> > >
> > >
> > > On Sun, Apr 21, 2024 at 6:14 PM Jack Haverty via Internet-history
> > > <internet-history at elists.isoc.org> wrote:
> > >
> > > Probably not many people know the story behind the IP checksum. I
> > > don't think anyone's ever written it down. While I still
> > remember...:
> > >
> > > The checksum algorithm was selected not for its capabilities to
> catch
> > > errors, but rather for its simplicity for our overworked and
> > > inadequate
> > > computing power. There was significant concern at the time,
> > > especially
> > > in the sites running the big host computers, about the use of
> scarce
> > > computing power as "overhead" involved in using the network. See
> for
> > > example: https://www.rfc-editor.org/rfc//rfc425
> > >
> > > Besides, at the time all TCP traffic was through the Arpanet, and
> the
> > > IMPs did their own checksums so any circuit problems would be
> caught
> > > there. So as we were defining the details of the new TCP4
> > > mechanisms,
> > > the checksum algorithm was kept intentionally simple, to be
> > > replaced in
> > > some future version of TCP when computers would be more capable
> > > and the
> > > error characteristics of pathways through the Internet were better
> > > understood by experience. The checksum algorithm was a
> > > placeholder for
> > > a future improved version, like many other mechanisms of TCP/IP4.
> > >
> > > The actual details of the checksum computation were nailed down on
> > > January 27, 1979. That was the date of the first TCP Bakeoff,
> > > organized
> > > by Jon Postel. I think of it as possibly the first ever
> > "Hackathon".
> > >
> > > The group of TCP implementers assembled on a weekend at USC-ISI and
> > > commandeered a bunch of offices with terminals that we could use to
> > > connect to our computers back home. At first, we could all talk
> to
> > > ourselves fine. However, no one could talk to any other
> > > implementation. Everybody was getting checksum errors.
> > >
> > > Since we could all hear each other, a discussion quickly reach a
> > > consensus. We turned off the checksum verification code in all
> > > of our
> > > implementations, so our TCPs would simply assume every incoming
> > > message/packet/datagram/segment (you pick your favorite term...)
> was
> > > error-free.
> > >
> > > It seems strange now, but computing in the 1970s was a lot
> different
> > > from today. In addition to the scarcity of CPU power and memory,
> > > there
> > > was little consensus about how bits were used inside of each
> > > computer,
> > > and how they were transferred onto wires by network interface
> > > hardware.
> > > Computers didn't agree on the number of bits in a byte, or how
> bytes
> > > were ordered into computer words, how arithmetic calculations were
> > > performed, or how to take the bits in and out of your computer's
> > > memory
> > > and transfer them serially over an I/O interface. If you think the
> > > confusion of today's USB connectors is bad, it was much worse 50
> > > years ago!
> > >
> > > Danny Cohen later published a great "plea for peace" that reveals
> > > some
> > > of the confusion - see https://www.rfc-editor.org/ien/ien137.txt
> > >
> > > So it wasn't a surprise that each TCP implementer had somehow
> > > failed in
> > > translating the specification, simple as it was, into code.
> > >
> > > The disabling of checksums enabled us to debug all this and slowly
> > > (took
> > > two days IIRC) got implementations to talk to other
> implementations.
> > > Then we re-enabled checksumming and tried all the tests again.
> TCP4
> > > worked! Jon Postel took on the task of figuring out how the now
> > > working
> > > checksums actually were doing the computations and revised the
> > > specifications accordingly. Rough consensus and running code had
> > > failed; instead we had running code and then rough consensus.
> > >
> > > My most memorable recollection of that weekend was late on Sunday.
> > > Jon
> > > had set up the Bakeoff with a "scoring scheme" which gave each
> > > participant a number of points for passing each test. His score
> > > rules
> > > are here:
> > >
> >
> https://drive.google.com/file/d/1NNc9tJTEQsVq-knCCWLeJ3zVrL2Xd25g/view?usp=sharing
> > >
> > > We were all getting tired, and Bill Plummer (Tenex TCP) shouted
> > > down the
> > > hall to Dave Clark (Multics TCP) -- "Hey Dave, can you turn off
> your
> > > checksumming again?" Dave replied "OK, it's off". Bill hit a key
> on
> > > his terminal. Dave yelled "Hey, Multics just crashed!" Bill
> gloated
> > > "KO! Ten points for me!"
> > >
> > > Such was how checksumming made it into TCP/IP4.
> > >
> > > Jack Haverty
> > >
> > >
> > >
> > > On 4/21/24 12:27, John Day via Internet-history wrote:
> > > > So I wasn’t dreaming! ;-)
> > > >
> > > > CRCs also have problems in HDLC if there are a lot of 1s in the
> > > data. (The bit stuffing is not included in the checksum
> > calculation.)
> > > >
> > > >> On Apr 21, 2024, at 15:22,touch at strayalpha.com wrote:
> > > >>
> > > >> I think it was this one:
> > > >> http://ccr.sigcomm.org/archive/1995/conf/partridge.pdf
> > > >>
> > > >> Joe
> > > >>
> > > >> —
> > > >> Dr. Joe Touch, temporal epistemologist
> > > >> www.strayalpha.com <http://www.strayalpha.com>
> > > >>
> > > >>> On Apr 21, 2024, at 12:20 PM, Scott Bradner via
> > > Internet-history<internet-history at elists.isoc.org> wrote:
> > > >>>
> > > >>> maybe in conjunction with the Pac Bell NAP
> > > >>>
> > > >>> https://www.cnet.com/tech/mobile/pac-bell-adds-network-access/
> > > >>>
> > > >>>
> https://mailman.nanog.org/pipermail/nanog/1998-March/127113.html
> > > >>>
> > > >>> Scott
> > > >>>
> > > >>>> On Apr 21, 2024, at 3:00 PM, John Day<jeanjour at comcast.net>
> > > wrote:
> > > >>>>
> > > >>>> I have a vague recollection of a paper (possibly by Craig
> > > Partridge) that talked about ATM dropping cells (and possibly
> > > other different forms of errors) and how IP and other protocols
> > > were not built to detect such losses.
> > > >>>>
> > > >>>> Am I dreaming?
> > > >>>>
> > > >>>> John
> > > >>>>
> > > >>>>> On Apr 21, 2024, at 09:10, Scott Bradner via
> > > Internet-history<internet-history at elists.isoc.org> wrote:
> > > >>>>>
> > > >>>>> yes but...
> > > >>>>>
> > > >>>>> the ATM Forum people felt that ATM should replace TCP and
> > > most of IP
> > > >>>>> i.e. become the new IP and that new applications should
> > > assume they were
> > > >>>>> running over ATM and directly make use of ATM features
> > > (e.g., ABR)
> > > >>>>>
> > > >>>>> ATM as yet another wire was just fine (though a bit choppy)
> > > >>>>>
> > > >>>>> Scott
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Apr 21, 2024, at 9:02 AM, Andrew G.
> > > Malis<agmalis at gmail.com> wrote:
> > > >>>>>>
> > > >>>>>> Scott,
> > > >>>>>>
> > > >>>>>> ATM could carry any protocol that you could carry over
> > > Ethernet, see RFCs 2225, 2492, and 2684.
> > > >>>>>>
> > > >>>>>> Cheers,
> > > >>>>>> Andy
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Sat, Apr 20, 2024 at 8:15 PM Scott Bradner via
> > > Internet-history<internet-history at elists.isoc.org> wrote:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> On Apr 20, 2024, at 8:11 PM, John Gilmore via
> > > Internet-history<internet-history at elists.isoc.org> wrote:
> > > >>>>>>>
> > > >>>>>>> John Day via
> > > Internet-history<internet-history at elists.isoc.org> wrote:
> > > >>>>>>>> In the early 70s, people were trying to figure out how to
> > > interwork multiple networks of different technologies. What was
> > > the solution that was arrived at that led to the current Internet?
> > > >>>>>>>> I conjectured yesterday that the fundamental solution
> > > must have been in hand by the time Cerf and Kahn published their
> > > paper.
> > > >>>>>>>> Are you conjecturing that the solution was gateways? and
> > > hence protocol translation at the gateways?
> > > >>>>>>> Maybe it's too obvious in retrospect. But the "solution"
> > > that I see was
> > > >>>>>>> that everyone had to move to using a protocol that was
> > > independent of
> > > >>>>>>> their physical medium.
> > > >>>>>> and ATM was an example of the reverse - it was a protocol &
> > > a network - OK
> > > >>>>>> as long as you did not build applications that knew they
> > > were running over ATM
> > > >>>>>> (or if ATM had been the last networking protocol)
> > > >>>>>>
> > > >>>>>> Scott
> > > >>>>>> --
> > > >>>>>> Internet-history mailing list
> > > >>>>>> Internet-history at elists.isoc.org
> > > >>>>>> https://elists.isoc.org/mailman/listinfo/internet-history
> > > >>>>> --
> > > >>>>> Internet-history mailing list
> > > >>>>> Internet-history at elists.isoc.org
> > > >>>>> https://elists.isoc.org/mailman/listinfo/internet-history
> > > >>> --
> > > >>> Internet-history mailing list
> > > >>> Internet-history at elists.isoc.org
> > > >>> https://elists.isoc.org/mailman/listinfo/internet-history
> > >
> > > --
> > > Internet-history mailing list
> > > Internet-history at elists.isoc.org
> > > https://elists.isoc.org/mailman/listinfo/internet-history
> > >
> > >
> > >
> > > --
> > > Sent by a Verified
> > > Sent by a Verified sender
> > > <https://wallet.unumid.co/authenticate?referralCode=tcp16fM4W47y>
> > > sender
> >
> > --
> > Internet-history mailing list
> > Internet-history at elists.isoc.org
> > https://elists.isoc.org/mailman/listinfo/internet-history
> >
> --
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history
>
More information about the Internet-history
mailing list