[ih] Somebody probably asked before - Trying to remember early net routing collapse

Mon Mar 20 17:58:07 PDT 2023

I don't remember the details, but in the early days there were frequent 
battles between the "core" gateways and the "research" gateways.  We 
(BBN) were supposed to keep the core running 24x7. But lots of people 
wanted to build gateways and try out ideas, and the gateway protocols 
(basically at the time just exchange of routing tables) were sometimes 
corrupted with nonsensical information.

Often incidents were simply caused by bugs in someone's experimental 
code.  But not always.  One case I sort of remember was when someone 
decided to put a new circuit between 2 university sites because they 
wanted faster service for their frequent file transfers.   It seemed (to 
them) like an obvious thing to do.  So they ordered a circuit from the 
phone company and plugged it into the routers at each site.  But because 
of the topology of the overall network, that new circuit became the 
"shortest path", simply because it was the fewest number of hops, for a 
lot of network traffic.   That could very easily have routed a lot of 
traffic through a Fuzzball, and resulted in those two sites actually 
experiencing much slower service.

Network behavior is often counter-intuitive....

Such "incidents" were the motivation, circa 1982, for creating EGP and 
the notion of Autonomous Systems (see RFC 827).  EGP provided a means 
for putting a sort of "firewall" between different parts of the Internet 
-- assuming you could figure out exactly how to filter routing 
information as it enter "your" Autonomous System to create your own 
protective firewall.       We needed such a mechanism in order to keep 
the "core" running while all sorts of experiments occurred in other 
parts of the Internet.

With EGP in place, the various research efforts were expected to then 
experiment and develop some kind of next generation routing scheme 
involving more appropriate metrics than just "hops" -- at least using 
transit time as a metric as the ARPANET had been doing, and perhaps 
introducing other metrics such as available bandwidth, or constraints 
based on policies such as only carrying certain kinds of data through 
particular networks.   AFAIK, that never happened and hops are still the 
basis for routing.

I'm pretty sure I wrote about this at some point years ago in the 
internet-history discussions.  Perhaps ChatGPT or one of its friends can 
find it.....

Jack Haverty

On 3/20/23 17:11, Karl Auerbach via Internet-history wrote:
> I am sure this has been discussed, but I can't seem to find it...
>
> I vaguely remember a story involving some of Dave Mills' machines and 
> a memory error in IMPs or some other switching device that caused all 
> of the net's traffic to be forwarded through one struggling Fuzzy* or 
> PDP-11/03.
>
> Could someone give me a pointer?
>
> I once did something similar - back when we were using flood-and-prune 
> routing for IP multicast, I was working at a site where our inbound 
> link was a T-1.  Our internal net had several Cisco routers [2500 
> series] all chatting away with DVMRP [the flood-and-prune multicast 
> routing protocol of that era.]  Anyway, while I was setting up one or 
> our internal 25xx routers I had not yet finished setting up the IP 
> unicast routing.  But that didn't stop my partially configured router 
> from chatting away with IGMP and DVMRP, it merely meant that that 
> router could not send the "prune, please stop sending me traffic!" 
> message.
>
> So that router eventually ended up at the end of every IP multicast 
> "flood" that was active on the MBone but without a way of saying 
> "stop, please stop!".  Our poor T-1 saturated.  I learned to not 
> enable IP multicast via DVMRP until my unicast routing was stable.  
> (We eventually moved onto PIM for multicast routing.)
>
>     --karl--
>
>