From perry at piermont.com  Mon Dec 13 13:01:47 2004
From: perry at piermont.com (Perry E. Metzger)
Date: Mon, 13 Dec 2004 16:01:47 -0500
Subject: [ih] Global congestion collapse
In-Reply-To: <1096878461.4794.76.camel@lap10-c703.uibk.ac.at> (Michael
	Welzl's message of "04 Oct 2004 10:27:41 +0200")
References: <1096878461.4794.76.camel@lap10-c703.uibk.ac.at>
Message-ID: <87mzwhap50.fsf@snark.piermont.com>


Sorry for not replying for a long time...

Michael Welzl <michael.welzl at uibk.ac.at> writes:
> Does anybody here have stories about the Internet's congestion
> collapse(s) during the 80's? Some details would be great!
[...]
> So, I wonder, what was it like? What are your experiences?
> When did folks first notice it?

I strongly remember a point in '88 or so (perhaps it was 87 -- it
probably wasn't '89) when it became impossible to move data back and
forth between Bellcore and NYNEXs research lab in White Plains over
the net because of congestion related problems. I was working on some
collaboration with them and suddenly found myself forced to make use
of mag tapes as the only practical way to move even fairly small files
back and forth. A mailing list I ran off of one of my machines also
started having trouble moving bits through efficiently.

As I recall, the arrival of kernel patches implementing congestion
control rapidly began to reverse the situation.

The first time I saw such patches was when Phil Karn handed them to me
one day, and I swiftly added them to the kernels of my lab's
Sun-3s. The world was somewhat different back then... :)

Perry


From mills at udel.edu  Mon Dec 13 17:05:50 2004
From: mills at udel.edu (David L. Mills)
Date: Tue, 14 Dec 2004 01:05:50 +0000
Subject: [ih] Global congestion collapse
In-Reply-To: <87mzwhap50.fsf@snark.piermont.com>
References: <1096878461.4794.76.camel@lap10-c703.uibk.ac.at>
	<87mzwhap50.fsf@snark.piermont.com>
Message-ID: <41BE3C6E.2090507@udel.edu>

Perry,

Well, if your incident was during 1986-1988 and involved transit of the 
NSFnet Phase-I backbone, I'm the perp. The NSFnet routers ran my code, 
which was horribly overrun by supercomputer traffic. I found the best 
way to deal with the problem was to find the supercomputer elephants and 
shoot them. More is in a 1988 SIGCOMM Symposium paper. More recently the 
USNO and NIST time servers are being overrun with NTP traffic. See my 
recent PTTI paper at www.eecis.udel.edu/+mills/papers.html.

The NSFnet meltdown occured primarily because the fuzzball routers used 
smart interfaces that retransmitted when either an error occured or the 
receiver ran dry of buffers. The entire network locked up for a time 
because all the buffers in all six machines filled up with retransmit 
traffic and nothing could get in or out. As I recall, the ARPAnet also 
had a similar problem with reassembly buffers.

Dave

Perry E. Metzger wrote:

>Sorry for not replying for a long time...
>
>Michael Welzl <michael.welzl at uibk.ac.at> writes:
>  
>
>>Does anybody here have stories about the Internet's congestion
>>collapse(s) during the 80's? Some details would be great!
>>    
>>
>[...]
>  
>
>>So, I wonder, what was it like? What are your experiences?
>>When did folks first notice it?
>>    
>>
>
>I strongly remember a point in '88 or so (perhaps it was 87 -- it
>probably wasn't '89) when it became impossible to move data back and
>forth between Bellcore and NYNEXs research lab in White Plains over
>the net because of congestion related problems. I was working on some
>collaboration with them and suddenly found myself forced to make use
>of mag tapes as the only practical way to move even fairly small files
>back and forth. A mailing list I ran off of one of my machines also
>started having trouble moving bits through efficiently.
>
>As I recall, the arrival of kernel patches implementing congestion
>control rapidly began to reverse the situation.
>
>The first time I saw such patches was when Phil Karn handed them to me
>one day, and I swiftly added them to the kernels of my lab's
>Sun-3s. The world was somewhat different back then... :)
>
>Perry
>  
>


From perry at piermont.com  Mon Dec 13 18:32:02 2004
From: perry at piermont.com (Perry E. Metzger)
Date: Mon, 13 Dec 2004 21:32:02 -0500
Subject: [ih] Global congestion collapse
In-Reply-To: <41BE3C6E.2090507@udel.edu> (David L. Mills's message of "Tue,
	14 Dec 2004 01:05:50 +0000")
References: <1096878461.4794.76.camel@lap10-c703.uibk.ac.at>
	<87mzwhap50.fsf@snark.piermont.com> <41BE3C6E.2090507@udel.edu>
Message-ID: <87r7lt7gpp.fsf@snark.piermont.com>


"David L. Mills" <mills at udel.edu> writes:
> Well, if your incident was during 1986-1988 and involved transit of
> the NSFnet Phase-I backbone, I'm the perp. The NSFnet routers ran my
> code, which was horribly overrun by supercomputer traffic. I found the
> best way to deal with the problem was to find the supercomputer
> elephants and shoot them. More is in a 1988 SIGCOMM Symposium
> paper. More recently the USNO and NIST time servers are being overrun
> with NTP traffic. See my recent PTTI paper at
> www.eecis.udel.edu/+mills/papers.html.
>
> The NSFnet meltdown occured primarily because the fuzzball routers
> used smart interfaces that retransmitted when either an error occured
> or the receiver ran dry of buffers. The entire network locked up for a
> time because all the buffers in all six machines filled up with
> retransmit traffic and nothing could get in or out. As I recall, the
> ARPAnet also had a similar problem with reassembly buffers.

Interesting. Bellcore switched from a 56k link to the IMP at Columbia
to NSFnet towards the end (latter half?) of that time, but I can't
remember if the horrible congestion was before or after our switch.

Either way, though, it was pretty shortly thereafter that I remember
getting my first replacement .o files with yummy new TCP congestion
control algorithms in them.

Perry


From mills at udel.edu  Mon Dec 13 20:37:30 2004
From: mills at udel.edu (David L. Mills)
Date: Tue, 14 Dec 2004 04:37:30 +0000
Subject: [ih] Global congestion collapse
In-Reply-To: <87r7lt7gpp.fsf@snark.piermont.com>
References: <1096878461.4794.76.camel@lap10-c703.uibk.ac.at>	<87mzwhap50.fsf@snark.piermont.com>
	<41BE3C6E.2090507@udel.edu> <87r7lt7gpp.fsf@snark.piermont.com>
Message-ID: <41BE6E0A.9030807@udel.edu>

Perry,

Not so fast. Steve Wolff of NSF and I had a nasty little secret we did 
not tell the NSFnet maintenance crew who could never keep a secret. I 
built in priority queueing and preemption in the fuzzball routers. The 
former wiretapped the telnet port and made it just below NTP on the 
priority scale. We put mail on the bottom just below ftp. A lot of 
telnet users stopped complaining because they thought we "fixed" the 
network.

The other thing was to shoot the elephants. When a new packet arrived 
and no buffer space was available, the output queues were scanned 
looking for the biggest elephant (total byte count on all queues from 
the same IP address) and killed its biggest  packet. Gunshots continued 
until either the arriving packet got shot or there was enough room to 
save it. It all worked gangbusters and the poor ftpers never found out.

Dave

Perry E. Metzger wrote:

>"David L. Mills" <mills at udel.edu> writes:
>  
>
>>Well, if your incident was during 1986-1988 and involved transit of
>>the NSFnet Phase-I backbone, I'm the perp. The NSFnet routers ran my
>>code, which was horribly overrun by supercomputer traffic. I found the
>>best way to deal with the problem was to find the supercomputer
>>elephants and shoot them. More is in a 1988 SIGCOMM Symposium
>>paper. More recently the USNO and NIST time servers are being overrun
>>with NTP traffic. See my recent PTTI paper at
>>www.eecis.udel.edu/+mills/papers.html.
>>
>>The NSFnet meltdown occured primarily because the fuzzball routers
>>used smart interfaces that retransmitted when either an error occured
>>or the receiver ran dry of buffers. The entire network locked up for a
>>time because all the buffers in all six machines filled up with
>>retransmit traffic and nothing could get in or out. As I recall, the
>>ARPAnet also had a similar problem with reassembly buffers.
>>    
>>
>
>Interesting. Bellcore switched from a 56k link to the IMP at Columbia
>to NSFnet towards the end (latter half?) of that time, but I can't
>remember if the horrible congestion was before or after our switch.
>
>Either way, though, it was pretty shortly thereafter that I remember
>getting my first replacement .o files with yummy new TCP congestion
>control algorithms in them.
>
>Perry
>  
>


From michael.welzl at uibk.ac.at  Mon Dec 13 22:47:01 2004
From: michael.welzl at uibk.ac.at (Michael Welzl)
Date: 14 Dec 2004 07:47:01 +0100
Subject: [ih] Global congestion collapse
In-Reply-To: <41BE6E0A.9030807@udel.edu>
References: <1096878461.4794.76.camel@lap10-c703.uibk.ac.at>
	<87mzwhap50.fsf@snark.piermont.com> <41BE3C6E.2090507@udel.edu>
	<87r7lt7gpp.fsf@snark.piermont.com>  <41BE6E0A.9030807@udel.edu>
Message-ID: <1103006821.4796.3.camel@lap10-c703.uibk.ac.at>

Folks,

Thanks a lot for answering my original question; this
discussion is getting more and more exciting  :)

Cheers,
Michael


On Tue, 2004-12-14 at 05:37, David L. Mills wrote:
> Perry,
> 
> Not so fast. Steve Wolff of NSF and I had a nasty little secret we did 
> not tell the NSFnet maintenance crew who could never keep a secret. I 
> built in priority queueing and preemption in the fuzzball routers. The 
> former wiretapped the telnet port and made it just below NTP on the 
> priority scale. We put mail on the bottom just below ftp. A lot of 
> telnet users stopped complaining because they thought we "fixed" the 
> network.
> 
> The other thing was to shoot the elephants. When a new packet arrived 
> and no buffer space was available, the output queues were scanned 
> looking for the biggest elephant (total byte count on all queues from 
> the same IP address) and killed its biggest  packet. Gunshots continued 
> until either the arriving packet got shot or there was enough room to 
> save it. It all worked gangbusters and the poor ftpers never found out.
> 
> Dave
> 
> Perry E. Metzger wrote:
> 
> >"David L. Mills" <mills at udel.edu> writes:
> >  
> >
> >>Well, if your incident was during 1986-1988 and involved transit of
> >>the NSFnet Phase-I backbone, I'm the perp. The NSFnet routers ran my
> >>code, which was horribly overrun by supercomputer traffic. I found the
> >>best way to deal with the problem was to find the supercomputer
> >>elephants and shoot them. More is in a 1988 SIGCOMM Symposium
> >>paper. More recently the USNO and NIST time servers are being overrun
> >>with NTP traffic. See my recent PTTI paper at
> >>www.eecis.udel.edu/+mills/papers.html.
> >>
> >>The NSFnet meltdown occured primarily because the fuzzball routers
> >>used smart interfaces that retransmitted when either an error occured
> >>or the receiver ran dry of buffers. The entire network locked up for a
> >>time because all the buffers in all six machines filled up with
> >>retransmit traffic and nothing could get in or out. As I recall, the
> >>ARPAnet also had a similar problem with reassembly buffers.
> >>    
> >>
> >
> >Interesting. Bellcore switched from a 56k link to the IMP at Columbia
> >to NSFnet towards the end (latter half?) of that time, but I can't
> >remember if the horrible congestion was before or after our switch.
> >
> >Either way, though, it was pretty shortly thereafter that I remember
> >getting my first replacement .o files with yummy new TCP congestion
> >control algorithms in them.
> >
> >Perry
> >  
> >


From sbrim at cisco.com  Tue Dec 14 04:31:09 2004
From: sbrim at cisco.com (Scott W Brim)
Date: Tue, 14 Dec 2004 07:31:09 -0500
Subject: [ih] Global congestion collapse
In-Reply-To: <41BE6E0A.9030807@udel.edu>
References: <1096878461.4794.76.camel@lap10-c703.uibk.ac.at>
	<87mzwhap50.fsf@snark.piermont.com> <41BE3C6E.2090507@udel.edu>
	<87r7lt7gpp.fsf@snark.piermont.com> <41BE6E0A.9030807@udel.edu>
Message-ID: <20041214123108.GC1492@sbrim-w2k02>

On Tue, Dec 14, 2004 04:37:30AM +0000, David L. Mills allegedly wrote:
> Perry,
> 
> Not so fast. Steve Wolff of NSF and I had a nasty little secret we did 
> not tell the NSFnet maintenance crew who could never keep a secret. I 
> built in priority queueing and preemption in the fuzzball routers. The 
> former wiretapped the telnet port and made it just below NTP on the 
> priority scale. We put mail on the bottom just below ftp. A lot of 
> telnet users stopped complaining because they thought we "fixed" the 
> network.

The news leaked out pretty quickly iirc :-)

Another thing I noticed was that people adjusted their behavior.  The
congestion spread in time when it couldn't spread any other way, and
filled most of the night.


From craig at aland.bbn.com  Tue Dec 14 06:09:36 2004
From: craig at aland.bbn.com (Craig Partridge)
Date: Tue, 14 Dec 2004 09:09:36 -0500
Subject: [ih] Global congestion collapse
In-Reply-To: Your message of "Mon, 13 Dec 2004 21:32:02 EST."
	<87r7lt7gpp.fsf@snark.piermont.com> 
Message-ID: <20041214140936.582251AD@aland.bbn.com>


In message <87r7lt7gpp.fsf at snark.piermont.com>, "Perry E. Metzger" writes:

>Interesting. Bellcore switched from a 56k link to the IMP at Columbia
>to NSFnet towards the end (latter half?) of that time, but I can't
>remember if the horrible congestion was before or after our switch.

ARPANET had trouble too.  I remember much tuning.

>Either way, though, it was pretty shortly thereafter that I remember
>getting my first replacement .o files with yummy new TCP congestion
>control algorithms in them.

That would have been Van's TCP mods (described in the SIGCOMM '88 paper).
It was astonishing how big a difference they made.

Craig


From perry at piermont.com  Tue Dec 14 08:18:59 2004
From: perry at piermont.com (Perry E. Metzger)
Date: Tue, 14 Dec 2004 11:18:59 -0500
Subject: [ih] Global congestion collapse
In-Reply-To: <20041214140936.582251AD@aland.bbn.com> (Craig Partridge's
	message of "Tue, 14 Dec 2004 09:09:36 -0500")
References: <20041214140936.582251AD@aland.bbn.com>
Message-ID: <87d5xcn98s.fsf@snark.piermont.com>


Craig Partridge <craig at aland.bbn.com> writes:
>>Either way, though, it was pretty shortly thereafter that I remember
>>getting my first replacement .o files with yummy new TCP congestion
>>control algorithms in them.
>
> That would have been Van's TCP mods (described in the SIGCOMM '88 paper).

Of course. :)

> It was astonishing how big a difference they made.

Yes, though apparently (according to David Mills in the last few notes
to this list) more was going on than I was aware of at the
time. (That's not surprising -- my research work around then was
debuggers for highly parallel systems, and I was not paying much
attention to the network except as a way of getting my work done...)

Perry


From touch at ISI.EDU  Wed Dec 15 07:09:03 2004
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 15 Dec 2004 07:09:03 -0800
Subject: [ih] Global congestion collapse
In-Reply-To: <41BE6E0A.9030807@udel.edu>
References: <1096878461.4794.76.camel@lap10-c703.uibk.ac.at>	<87mzwhap50.fsf@snark.piermont.com>	<41BE3C6E.2090507@udel.edu>
	<87r7lt7gpp.fsf@snark.piermont.com> <41BE6E0A.9030807@udel.edu>
Message-ID: <41C0538F.9080108@isi.edu>


David L. Mills wrote:
> Perry,
> 
> Not so fast. Steve Wolff of NSF and I had a nasty little secret we did 
> not tell the NSFnet maintenance crew who could never keep a secret. I 
> built in priority queueing and preemption in the fuzzball routers. The 
> former wiretapped the telnet port and made it just below NTP on the 
> priority scale. We put mail on the bottom just below ftp. A lot of 
> telnet users stopped complaining because they thought we "fixed" the 
> network.
> 
> The other thing was to shoot the elephants. When a new packet arrived 
> and no buffer space was available, the output queues were scanned 
> looking for the biggest elephant (total byte count on all queues from 
> the same IP address) and killed its biggest  packet. Gunshots continued 
> until either the arriving packet got shot or there was enough room to 
> save it. It all worked gangbusters and the poor ftpers never found out.

RED would benefit from two variants - per packet (when per-packet ops 
are the bottleneck) and per-byte weighting, though it doesn't seem to be 
described that way much. This sounds a lot like per-byte (the more 
common case now anyway), except that RED is statistical (everyone gets 
slammed, proportional to their load) and this hits each in series 
(largest user first, then next-largest when largest backs off, etc.). 
Was there ever any backlash (software oscillation or people complaining) 
from that?

Joe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20041215/1e3303ee/attachment.asc>

From touch at ISI.EDU  Wed Dec 15 07:11:26 2004
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 15 Dec 2004 07:11:26 -0800
Subject: [ih] Global congestion collapse
In-Reply-To: <20041214140936.582251AD@aland.bbn.com>
References: <20041214140936.582251AD@aland.bbn.com>
Message-ID: <41C0541E.701@isi.edu>


Craig Partridge wrote:
...
>>Either way, though, it was pretty shortly thereafter that I remember
>>getting my first replacement .o files with yummy new TCP congestion
>>control algorithms in them.
> 
> That would have been Van's TCP mods (described in the SIGCOMM '88 paper).
> It was astonishing how big a difference they made.

Not to downplay the utility of Van's variant, but it seems like _any_ 
congestion control would have (or may have - e.g. Dave's mods) made an 
astonishing impact.

Joe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20041215/26038b12/attachment.asc>

From mills at udel.edu  Wed Dec 15 08:48:36 2004
From: mills at udel.edu (David L. Mills)
Date: Wed, 15 Dec 2004 16:48:36 +0000
Subject: [ih] Global congestion collapse
In-Reply-To: <41C0538F.9080108@isi.edu>
References: <1096878461.4794.76.camel@lap10-c703.uibk.ac.at>	<87mzwhap50.fsf@snark.piermont.com>	<41BE3C6E.2090507@udel.edu>
	<87r7lt7gpp.fsf@snark.piermont.com> <41BE6E0A.9030807@udel.edu>
	<41C0538F.9080108@isi.edu>
Message-ID: <41C06AE4.6080903@udel.edu>

Joe,

RED has always been a problem with me. That's like shooting a load of 
buckshot at the herd of elephants and tigers and hoping you hit an 
elephant. My agenda was to find the elephants first and then target them.

Dave

Joe Touch wrote:

>
>
> David L. Mills wrote:
>
>> Perry,
>>
>> Not so fast. Steve Wolff of NSF and I had a nasty little secret we 
>> did not tell the NSFnet maintenance crew who could never keep a 
>> secret. I built in priority queueing and preemption in the fuzzball 
>> routers. The former wiretapped the telnet port and made it just below 
>> NTP on the priority scale. We put mail on the bottom just below ftp. 
>> A lot of telnet users stopped complaining because they thought we 
>> "fixed" the network.
>>
>> The other thing was to shoot the elephants. When a new packet arrived 
>> and no buffer space was available, the output queues were scanned 
>> looking for the biggest elephant (total byte count on all queues from 
>> the same IP address) and killed its biggest  packet. Gunshots 
>> continued until either the arriving packet got shot or there was 
>> enough room to save it. It all worked gangbusters and the poor ftpers 
>> never found out.
>
>
> RED would benefit from two variants - per packet (when per-packet ops 
> are the bottleneck) and per-byte weighting, though it doesn't seem to 
> be described that way much. This sounds a lot like per-byte (the more 
> common case now anyway), except that RED is statistical (everyone gets 
> slammed, proportional to their load) and this hits each in series 
> (largest user first, then next-largest when largest backs off, etc.). 
> Was there ever any backlash (software oscillation or people 
> complaining) from that?
>
> Joe


From mills at udel.edu  Wed Dec 15 08:51:35 2004
From: mills at udel.edu (David L. Mills)
Date: Wed, 15 Dec 2004 16:51:35 +0000
Subject: [ih] Global congestion collapse
In-Reply-To: <41C0541E.701@isi.edu>
References: <20041214140936.582251AD@aland.bbn.com> <41C0541E.701@isi.edu>
Message-ID: <41C06B97.1030208@udel.edu>

Joe,

That's my point. The elephants are a small percentage of the population, 
but generate the vast amount of congestion. My recent PTTI paper 
(www.eecis.udel.edu/~mills/papers.html) show that 78 percent of the 
congestion seen at busy NTP servers is due to 18 percent of the population.

Dave

Joe Touch wrote:

>
>
> Craig Partridge wrote:
> ...
>
>>> Either way, though, it was pretty shortly thereafter that I remember
>>> getting my first replacement .o files with yummy new TCP congestion
>>> control algorithms in them.
>>
>>
>> That would have been Van's TCP mods (described in the SIGCOMM '88 
>> paper).
>> It was astonishing how big a difference they made.
>
>
> Not to downplay the utility of Van's variant, but it seems like _any_ 
> congestion control would have (or may have - e.g. Dave's mods) made an 
> astonishing impact.
>
> Joe


From faber at ISI.EDU  Wed Dec 15 08:57:32 2004
From: faber at ISI.EDU (Ted Faber)
Date: Wed, 15 Dec 2004 08:57:32 -0800
Subject: [ih] Global congestion collapse
In-Reply-To: <41C0541E.701@isi.edu>
References: <20041214140936.582251AD@aland.bbn.com> <41C0541E.701@isi.edu>
Message-ID: <20041215165732.GA35624@pun.isi.edu>

On Wed, Dec 15, 2004 at 07:11:26AM -0800, Joe Touch wrote:
> 
> 
> Craig Partridge wrote:
> ...
> >>Either way, though, it was pretty shortly thereafter that I remember
> >>getting my first replacement .o files with yummy new TCP congestion
> >>control algorithms in them.
> >
> >That would have been Van's TCP mods (described in the SIGCOMM '88 paper).
> >It was astonishing how big a difference they made.
> 
> Not to downplay the utility of Van's variant, but it seems like _any_ 
> congestion control would have (or may have - e.g. Dave's mods) made an 
> astonishing impact.

There's a fundamental difference between an e2e control like Van's and a
queueing system like Dave's.  One reduces load and one reallocates
scarce resources to the more deserving.  While sophisticated queueing is
undeniably helpful, the end-to-end control a necessity.


-- 
Ted Faber
http://www.isi.edu/~faber           PGP: http://www.isi.edu/~faber/pubkeys.asc
Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20041215/9ae59fb7/attachment.sig>

From touch at ISI.EDU  Wed Dec 15 09:54:52 2004
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 15 Dec 2004 09:54:52 -0800
Subject: [ih] Global congestion collapse
In-Reply-To: <20041215165732.GA35624@pun.isi.edu>
References: <20041214140936.582251AD@aland.bbn.com> <41C0541E.701@isi.edu>
	<20041215165732.GA35624@pun.isi.edu>
Message-ID: <41C07A6C.8050002@isi.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Ted Faber wrote:
| On Wed, Dec 15, 2004 at 07:11:26AM -0800, Joe Touch wrote:
|
|>
|>Craig Partridge wrote:
|>...
|>
|>>>Either way, though, it was pretty shortly thereafter that I remember
|>>>getting my first replacement .o files with yummy new TCP congestion
|>>>control algorithms in them.
|>>
|>>That would have been Van's TCP mods (described in the SIGCOMM '88 paper).
|>>It was astonishing how big a difference they made.
|>
|>Not to downplay the utility of Van's variant, but it seems like _any_
|>congestion control would have (or may have - e.g. Dave's mods) made an
|>astonishing impact.
|
|
| There's a fundamental difference between an e2e control like Van's and a
| queueing system like Dave's.  One reduces load and one reallocates
| scarce resources to the more deserving.  While sophisticated queueing is
| undeniably helpful, the end-to-end control a necessity.

Why? Granted it's useful, granted that it avoids needing to deploy
Dave's stuff throughout (which is otherwise required) - but if that were
done, why is E2E control a "necessity"?

Joe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBwHpsE5f5cImnZrsRAiJYAKDMUkEQpprAno4qEowvqeD7gf4g9ACfSc8b
aVL6YnjUweZNFS8Anf0kDR8=
=eOFM
-----END PGP SIGNATURE-----


From michael.welzl at uibk.ac.at  Sun Dec 26 12:17:34 2004
From: michael.welzl at uibk.ac.at (Michael Welzl)
Date: Sun, 26 Dec 2004 21:17:34 +0100
Subject: [ih] Re: Global congestion collapse
References: <200412152000.iBFK02Q19053@boreas.isi.edu>
Message-ID: <000a01c4eb87$f15db8a0$0200a8c0@fun>

Dear all,

Some of you mentioned a TCP patch by Jacobson in this thread - e.g.:

> Craig Partridge wrote:
> ...
> >>Either way, though, it was pretty shortly thereafter that I remember
> >>getting my first replacement .o files with yummy new TCP congestion
> >>control algorithms in them.

I'm interested in the history of Internet congestion control; so,
I wonder:

* were admins aware that this patch would reduce your own rate and
   might make things worse for you if you're the only one who installs it?
   e.g., think of 1000 * unresponsive UDP vs. 1 * TCP - across a single
   bottleneck - in this scenario, a single unresponsive flow would be
   better off than a single TCP flow.

* Van Jacobson's paper came out in August 1988. I think that the first
   RFC which says "you MUST implement congestion control" is
   RFC 1122 - which came out October 1989. What happened in between?
   Was it just a patch flying around and word of mouth ("c'mon, install it,
   we'll all be better off")?

It all looks a bit like an Internet community type of thing to me that
couldn't work like this nowadays. Am I right?

Cheers,
Michael


From craig at aland.bbn.com  Sun Dec 26 13:10:41 2004
From: craig at aland.bbn.com (Craig Partridge)
Date: Sun, 26 Dec 2004 16:10:41 -0500
Subject: [ih] Re: Global congestion collapse
In-Reply-To: Your message of "Sun, 26 Dec 2004 21:17:34 +0100."
	<000a01c4eb87$f15db8a0$0200a8c0@fun> 
Message-ID: <20041226211041.9057F1A8@aland.bbn.com>


In message <000a01c4eb87$f15db8a0$0200a8c0 at fun>, "Michael Welzl" writes:

>* were admins aware that this patch would reduce your own rate and
>   might make things worse for you if you're the only one who installs it?
>   e.g., think of 1000 * unresponsive UDP vs. 1 * TCP - across a single
>   bottleneck - in this scenario, a single unresponsive flow would be
>   better off than a single TCP flow.

Actually the great thing about Van's patch was that the existing TCPs
were so bad, that being the only one running Van's patch meant you
got *better* performance.  Only later did people figure out how to
create unresponsive TCP's that were well-behaved enough they'd win in
this fight.

[Reaching deep into my brain, my recollection is that Van's worked
better because it (a) did RTT estimation right and (b) slow start allowed
it to correctly probe available bandwidth, whereas existing implementations
just hammered at not enough bandwidth.  Happy to be corrected, this was
long ago]

>* Van Jacobson's paper came out in August 1988. I think that the first
>   RFC which says "you MUST implement congestion control" is
>   RFC 1122 - which came out October 1989. What happened in between?
>   Was it just a patch flying around and word of mouth ("c'mon, install it,
>   we'll all be better off")?

The patch came out well before August of 1988.  

And yes, it was word of mouth -- or perhaps, better said, notes on the
TCP-IP list.  There's a note from Van on 11 Feb 88 discussing the work
and a note from Dan Lynch soon thereafter inviting people to a tutorial
at Interop about it.  There's a Jan 87 note from Van saying he and
Mike Karels are experimenting with the mods.  I tried to find the
actually software release but all I could find was the official release
on 6 Dec 88 (whereas the patch had been around for a while by then).

If you don't have the TCP-IP archives, well worth reading (I grabbed
what I could when I realized they might be endangered and appear
to have much of the list from 82 to 91).

Craig


From craig at aland.bbn.com  Thu Dec 30 09:58:45 2004
From: craig at aland.bbn.com (Craig Partridge)
Date: Thu, 30 Dec 2004 12:58:45 -0500
Subject: [ih] Re: Global congestion collapse
Message-ID: <20041230175845.540991AB@aland.bbn.com>


Following up our discussions on this topic, I had cause today to re-read
the proceedings of the 6th IETF (April 1987), which are on-line at
www.ietf.org.  They include minutes (p. 9) in which Van Jacobson describes early
thinking about slow start (and why it works better than what was in
the Internet at the time).  Also included are Van's slides!

The minutes also include a report from the ARPANET team on how they dealt
with an episode of congestion collapse (apparently around January of 1987)
with upgrades to IMP/PSN software -- and traffic distribution matrices
showing where the heavy congestion was observed -- fun stuff (and it
brings back wonderful memories..).

Craig