[ih] Global congestion collapse

Mon Dec 13 20:37:30 PST 2004

Perry,

Not so fast. Steve Wolff of NSF and I had a nasty little secret we did 
not tell the NSFnet maintenance crew who could never keep a secret. I 
built in priority queueing and preemption in the fuzzball routers. The 
former wiretapped the telnet port and made it just below NTP on the 
priority scale. We put mail on the bottom just below ftp. A lot of 
telnet users stopped complaining because they thought we "fixed" the 
network.

The other thing was to shoot the elephants. When a new packet arrived 
and no buffer space was available, the output queues were scanned 
looking for the biggest elephant (total byte count on all queues from 
the same IP address) and killed its biggest  packet. Gunshots continued 
until either the arriving packet got shot or there was enough room to 
save it. It all worked gangbusters and the poor ftpers never found out.

Dave

Perry E. Metzger wrote:

>"David L. Mills" <mills at udel.edu> writes:
>  
>
>>Well, if your incident was during 1986-1988 and involved transit of
>>the NSFnet Phase-I backbone, I'm the perp. The NSFnet routers ran my
>>code, which was horribly overrun by supercomputer traffic. I found the
>>best way to deal with the problem was to find the supercomputer
>>elephants and shoot them. More is in a 1988 SIGCOMM Symposium
>>paper. More recently the USNO and NIST time servers are being overrun
>>with NTP traffic. See my recent PTTI paper at
>>www.eecis.udel.edu/+mills/papers.html.
>>
>>The NSFnet meltdown occured primarily because the fuzzball routers
>>used smart interfaces that retransmitted when either an error occured
>>or the receiver ran dry of buffers. The entire network locked up for a
>>time because all the buffers in all six machines filled up with
>>retransmit traffic and nothing could get in or out. As I recall, the
>>ARPAnet also had a similar problem with reassembly buffers.
>>    
>>
>
>Interesting. Bellcore switched from a 56k link to the IMP at Columbia
>to NSFnet towards the end (latter half?) of that time, but I can't
>remember if the horrible congestion was before or after our switch.
>
>Either way, though, it was pretty shortly thereafter that I remember
>getting my first replacement .o files with yummy new TCP congestion
>control algorithms in them.
>
>Perry
>  
>