[ih] history of protocol bugs

Jack Haverty jack at 3kitty.org
Sat Nov 11 11:10:35 PST 2023


There's quite a lot of discussion about protocol issues reported in the 
Technical Reports about the Arpanet project over the 1980s. Early 
versions of some algorithms and protocols were found to be inadequate 
when the technology was actually used in real-world conditions.   
Changes were proposed, documented, and tested as the Arpanet grew and 
evolved over the years.

Much of the contemporary documentation is archived in DTIC, and can be 
found by searching for "Arpanet" on the DTIC site - e.g., 
https://apps.dtic.mil/sti/citations/tr/ADA121350   There are many other 
such reports.   Searching for the "contract number" is also productive - 
it was MDA903-78-C-0129 for the Arpanet main contract.   The phrase 
"Quarterly Technical Report" is another good search term to retrieve the 
history of the operation and evolution of the network, describing many 
of the "bugs" encountered and fixed.

One incident I recall happened sometime in the 1980s, as the Arpanet was 
becoming the DDN (Defense Data Network) to serve as the worldwide 
communications infrastructure for the US Defense Department.   Sorry I 
can't remember the references, but someone at a university had published 
a paper which mathematically proved that the Arpanet (and hence the DDN) 
would "lock up" and all communications would cease.   That 
understandably disturbed the government sponsors of the DDN, who 
immediately tasked BBN to find the problem and fix it.   Yesterday!

The math gurus at BBN analyzed the report and concluded (IIRC) that the 
analysis was correct -- the DDN would in fact lock up and all 
communications would fail.   However, they also discovered that the 
academic analysis had made certain assumptions, which greatly simplified 
their analysis.   In particular, the paper assumed that all the switches 
(computers called IMPs) in the network ran at exactly the same speed and 
were started at exactly the same time. So IF all the computers 
comprising the network executed instructions in perfect synchronization, 
eventually the entire system would crash.  Big IF...

At BBN, we advised the government that although such an incident was 
mathematically possible, we didn't have any idea how to actually make 
multiple computers scattered around the world start at exactly the same 
time and run instruction-by-instruction in lockstep.  In theory, such an 
incident could happen; in practice it was unlikely to ever happen for 
thousands or even millions of years.

Marauding backhoes were by far a bigger threat to network operation.   
But the Arpanet protocols and algorithms mitigated that threat pretty well.

As the Arpanet transmuted into the Internet, such mathematical analyses 
became much more challenging.   Networks interconnected by gateways were 
much more complex in behavior than the circuits interconnecting switches 
in the Arpanet.  I don't recall seeing many formal mathematical analyses 
of Internet protocols and algorithms, at least in the early years when I 
was involved.  We were aware of the issues in networks, largely from 
experience with the Arpanet, but often didn't have practical solutions.  
But there were lots of Ideas in the Internet research group, and The 
Internet was the Experiment to see which Ideas worked in actual 
practice.   Ideas couldn't be analyzed, but they could be implemented, 
deployed, and tested in real-world use.

Some of the Ideas of TCP4 were actually "placeholders" for the real 
mechanisms to be developed later and tested in subsequent live operation 
of the Internet Experiment.  One might view them as "bugs" but they were 
intentionally put into the design.  One example I recall is "Source 
Quench", which is a congestion control mechanism that I at least never 
thought was viable.  Another was the checksum algorithm which was poor 
at handling typical communication errors but didn't require a lot of 
scarce computing power in the switches and host computers of the era.   
TTL (Time-To-Live) was considered important for the algorithms, 
especially routing and handling of time-sensitive data flows such as 
interactive speech.  But, as implemented, it had nothing to do with 
time, but rather was defined as a "hop count", since the computers of 
the era didn't have the ability to measure time (until Dave Mills and 
crew created NTP).

Hope this helps,
Jack Haverty


On 11/10/23 01:16, Gergely Buday via Internet-history wrote:
> Hi there,
>
> is there a written history of Internet protocol bugs?
>
> Somebody suggested to find the obsolete RFCs and figure out why they went
> obsolete.
>
> Other than that, what would you recommend to figure out this history?
>
> - Gergely



More information about the Internet-history mailing list