[ih] Failures of the early Internet

Mon Jan 22 14:18:35 PST 2024

Does anyone know of any mathematical analyses of the Internet 
architecture, i.e., involving TCP flow control and retransmissions 
assigned to the host computers, and switches passing datagrams through 
underlying highly variable networks instead of telephone circuits?

When we were implementing TCP and routers in the 1980s, I recall looking 
for but not finding any theoretical backup for the architectural 
decisions that were being made (such as using "hops" for routing instead 
of transit time, retransmission end-to-end rather than by each link, and 
"Source Quench" as a mechanism for congestion control).

I also recall discussions where ARPA's charter was discussed.   To 
plagiarize Star Trek - ARPA should strive to go where no one has gone 
before.   So some decisions about protocols or algorithms were made not 
because we knew that some new specific approach would work (in theory or 
from experience), but rather because we did not know that the approach 
would not work.

Just curious if there's been any more formal theoretical analyses of the 
Internet architecture, which is quite different from the Arpanet.   TCP 
does a good job keeping data flowing, but by its nature it splits the 
Arpanet-style network mechanisms so that neither the end-user nor the 
network operator has a clear view of what's happening in the Internet.

For example, while operating a corporate "intranet" (in the 1990s) we 
observed a behavior where a TCP connection would be operating as 
expected, a momentary circuit outage would occur, a flurry of 
retransmissions would compensate for the glitch, and the TCP connection 
would just continue as it should, without the users even noticing.   But 
we observed that every datagram was being sent twice after the glitch, 
so total user throughput was halved and circuit efficiency was 
alarmingly poor for the duration of the connection. On an expensive 
trans-Pacific circuit, that mattered.   We only noticed this situation 
because we were both the user and the network operator at the time, and 
happened to "catch" the behavior. Serendipity.

Was such observed behavior just a one-time anomaly?  Or is it somehow 
inherent in the Internet architecture?   How about other observations?  
Is the behavior today known as "bufferbloat" avoidable?  Or is it 
inherent in the design?

What does theory say about such characteristics of today's Internet?

Jack Haverty

On 1/22/24 13:25, Leonard Kleinrock wrote:
>
> I have some data that can add to the discussion regarding early 
> ARPANET lockups, etc. A number of them are detailed and documented in 
> Chapter 6 of my /Queueing Systems, Volume 2: Computer Applications 
> (19/75).  Among those included are:
>
> -Reassembly lockup
> -Direct store-and-forward lockup
> -Indirect store-and-forward lockup
> -Christmas lockup
> -Piggyback lockup
> and discussions of the early ARPANET flow control protocols.
>
> Assuming that this readership finds this of interest, I have scanned 
> and attached a link for  two sections of my book (Section 6.3. FLOW 
> CONTROL and Section 6.4. LOCKUPS, DEGRADATIONS AND TRAPS) in which 
> these matters are detailed.  The link is 
> https://www.icloud.com/iclouddrive/0ca-SfiMKqzY4N-egPfYRTloA#Deadlocks_etc_pp_438-451_Volume_2
>
> Len Kleinrock
> UCLA Computer Science Department
> <Deadlocks etc pp 438-451 Volume 2.pdf>
>
>> On Jan 19, 2024, at 6:20 PM, Jack Haverty via Internet-history 
>> <internet-history at elists.isoc.org> wrote:
>>
>> On 1/19/24 16:00, Karl Auerbach via Internet-history wrote:
>>> (I've never felt that I have an adequate understanding of the early 
>>> routing failures and their effects.)
>>
>> OK, I'll jump in.....
>>
>> It was painfully easy for routing problems to occur.   All one had to 
>> do was advertise to a neighboring router that you were the best route 
>> to everywhere.   A simple bug could do the job. Word would quickly 
>> spread, and all traffic would head your way, which sometimes made it 
>> impossible to connect to the offending router to try to fix the 
>> problem.  IIRC, something like that was what the Fuzzballs 
>> occasionally did.
>>
>> Another incident I recall was also a routing issue.  I don't remember 
>> exactly where it happened, but two sites, universities IIRC, were 
>> collaborating on some research project and had a need to send data 
>> back and forth.  Their pathway to each other through the Internet was 
>> somewhat long and often congested.   So they decided to fix the 
>> problem by installing a circuit directly between their two campus' 
>> routers.
>>
>> Money was of course an issue, but they found the funds to pay for a 
>> 9.6 kb/s line.  They were surprised to observe that the added line 
>> only made things worse.  File transfers took even longer than 
>> before.  Of course their change to the topology of the Internet had 
>> unexpectedly made their 9.6 line the best route for all sorts of 
>> Internet traffic unrelated to their project.
>>
>> Many of the incidents I remember were caused by the routing 
>> algorithms which were based on "hops" rather than on time (as had 
>> been the case in the Arpanet for a decade or more).   This was a 
>> well-known problem which I think was part of the motivation for Dave 
>> Mills to create the NTP machinery.  In addition to routing, there 
>> were other Internet mechanisms that depended on time, but had 
>> necessarily been implemented "temporarily" until good time mechanisms 
>> were available.  For example, the TTL (Time To Live) and TOS (Type Of 
>> Service) values in IP were supposed to provide the routers with 
>> information to route IP datagrams over the most appropriate route, or 
>> quickly discard them if there was no expectation they could possibly 
>> get to their destination in time to still be useful.
>>
>> Dave worked hard to get Time as an inherent element of The Internet, 
>> and our expectation was that TCP and IP software throughout the 
>> Internet would be changed to make decisions based on Time rather than 
>> Hops.   I'm not sure if that ever happened.   The Internet now knows 
>> what time it is, but does networking software today ever look at its 
>> watch?
>>
>> Another incident I recall was not an Internet failure, but rather a 
>> situation where the Internet terrorized the Arpanet.
>>
>> The Arpanet was touted as a "packet network", but in reality it was a 
>> virtual circuit network, using packets internally.   There were lots 
>> of mechanisms inside the Arpanet IMPs to make all user traffic travel 
>> to its destination intact and in the same order it was sent.   The 
>> network was designed to match the typical usage patterns of the era - 
>> people connected to some computer somewhere on the Arpanet, did their 
>> work, and disconnected minutes or even hours later. Inside the 
>> Arpanet, the mechanisms to set up virtual circuits consumed resources 
>> and took time, but with sessions lasting minutes or hours the impact 
>> was tolerable.
>>
>> One day the Arpanet was having problems and response times were 
>> noticeably slower than usual.  Investigation revealed that the 
>> Arpanet was flailing, constantly setting up and tearing down virtual 
>> circuits, each of which was only lasting for a second or two.   The 
>> Arpanet NOC (down the hall from my office) was in crisis.
>>
>> Eventually the problem was traced down to a new release of OS 
>> software (BSD, IIRC) that had just been posted on the Arpanet, and 
>> was being installed in the large numbers of workstations (Sun, IIRC) 
>> that had started appearing on the Internet.  The new OS release 
>> included a new tool to advise its users of the current status of the 
>> Internet.  It accomplished that by "pinging" every router every few 
>> minutes to see if that router was up and responsive.
>>
>> Pinging involved sending a single datagram, and receiving a single 
>> datgram in response.  But each such datagram required the Arpanet to 
>> set up a virtual circuit to carry that traffic. With lots of OSes and 
>> lots of routers now scattered around the Arpanet, it was trying to do 
>> something it was never designed to do.   As more workstations loaded 
>> the new OS release, the problem only got worse.
>>
>> Although this wasn't an "Internet failure", it was a system failure, 
>> caused by the Internet. Administrative action suppressed the problem 
>> and as the Arpanet was decommissioned the problem disappeared.  Or 
>> perhaps moved somewhere else?
>>
>> Anybody else have recollections of early failures...?
>>
>> Jack Haverty
>>
>>
>>
>>
>>
>> -- 
>> Internet-history mailing list
>> Internet-history at elists.isoc.org
>> https://elists.isoc.org/mailman/listinfo/internet-history
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20240122/f18405fe/attachment.asc>