[ih] history of protocol bugs
Jack Haverty
jack at 3kitty.org
Sat Nov 11 11:10:35 PST 2023
There's quite a lot of discussion about protocol issues reported in the
Technical Reports about the Arpanet project over the 1980s. Early
versions of some algorithms and protocols were found to be inadequate
when the technology was actually used in real-world conditions.
Changes were proposed, documented, and tested as the Arpanet grew and
evolved over the years.
Much of the contemporary documentation is archived in DTIC, and can be
found by searching for "Arpanet" on the DTIC site - e.g.,
https://apps.dtic.mil/sti/citations/tr/ADA121350 There are many other
such reports. Searching for the "contract number" is also productive -
it was MDA903-78-C-0129 for the Arpanet main contract. The phrase
"Quarterly Technical Report" is another good search term to retrieve the
history of the operation and evolution of the network, describing many
of the "bugs" encountered and fixed.
One incident I recall happened sometime in the 1980s, as the Arpanet was
becoming the DDN (Defense Data Network) to serve as the worldwide
communications infrastructure for the US Defense Department. Sorry I
can't remember the references, but someone at a university had published
a paper which mathematically proved that the Arpanet (and hence the DDN)
would "lock up" and all communications would cease. That
understandably disturbed the government sponsors of the DDN, who
immediately tasked BBN to find the problem and fix it. Yesterday!
The math gurus at BBN analyzed the report and concluded (IIRC) that the
analysis was correct -- the DDN would in fact lock up and all
communications would fail. However, they also discovered that the
academic analysis had made certain assumptions, which greatly simplified
their analysis. In particular, the paper assumed that all the switches
(computers called IMPs) in the network ran at exactly the same speed and
were started at exactly the same time. So IF all the computers
comprising the network executed instructions in perfect synchronization,
eventually the entire system would crash. Big IF...
At BBN, we advised the government that although such an incident was
mathematically possible, we didn't have any idea how to actually make
multiple computers scattered around the world start at exactly the same
time and run instruction-by-instruction in lockstep. In theory, such an
incident could happen; in practice it was unlikely to ever happen for
thousands or even millions of years.
Marauding backhoes were by far a bigger threat to network operation.
But the Arpanet protocols and algorithms mitigated that threat pretty well.
As the Arpanet transmuted into the Internet, such mathematical analyses
became much more challenging. Networks interconnected by gateways were
much more complex in behavior than the circuits interconnecting switches
in the Arpanet. I don't recall seeing many formal mathematical analyses
of Internet protocols and algorithms, at least in the early years when I
was involved. We were aware of the issues in networks, largely from
experience with the Arpanet, but often didn't have practical solutions.
But there were lots of Ideas in the Internet research group, and The
Internet was the Experiment to see which Ideas worked in actual
practice. Ideas couldn't be analyzed, but they could be implemented,
deployed, and tested in real-world use.
Some of the Ideas of TCP4 were actually "placeholders" for the real
mechanisms to be developed later and tested in subsequent live operation
of the Internet Experiment. One might view them as "bugs" but they were
intentionally put into the design. One example I recall is "Source
Quench", which is a congestion control mechanism that I at least never
thought was viable. Another was the checksum algorithm which was poor
at handling typical communication errors but didn't require a lot of
scarce computing power in the switches and host computers of the era.
TTL (Time-To-Live) was considered important for the algorithms,
especially routing and handling of time-sensitive data flows such as
interactive speech. But, as implemented, it had nothing to do with
time, but rather was defined as a "hop count", since the computers of
the era didn't have the ability to measure time (until Dave Mills and
crew created NTP).
Hope this helps,
Jack Haverty
On 11/10/23 01:16, Gergely Buday via Internet-history wrote:
> Hi there,
>
> is there a written history of Internet protocol bugs?
>
> Somebody suggested to find the obsolete RFCs and figure out why they went
> obsolete.
>
> Other than that, what would you recommend to figure out this history?
>
> - Gergely
More information about the Internet-history
mailing list