[ih] internet-history Digest, Vol 84, Issue 4

Wed May 21 12:08:17 PDT 2014

PS:

    > From: jnc at mercury.lcs.mit.edu (Noel Chiappa)

    >> Clarity on how/when it began to become evident that the naive
    >> algorithms documented in the TCP RFCs and used in early testing would
    >> themselves become the source of trouble.

    > Well, not really (to my eyes); they mostly simply were not _always
    > effective_ at controlling congestion (although they did generate some
    > useless, duplicate packets). 
    > ...
    > So, as with many things, what is crystal clear in hindsight was rather
    > obscured without the mental frameworks, etc that we have now (e.g.
    > F=ma).

For an illuminating take on this all, try (re-)reading RFC-793.

It only contains the word 'congestion' twice - both in sections of generic
text (e.g. about how a packet could be lost).

It does spend a while on RTT estimation algorithms - but _only_ to find out
when data needs to be re-transmitted. (IOW, we were so focused on 'getting
the data through ASAP' that that's all we saw the RTT as being for - so that
as soon as the ACK was seen to be missing, we could retransmit the packet
and get the transfer going again).

And as to what to do when a timeout happened (which usually, although not
always, indicates that a packet has been dropped due to congestion), it says:

  if the retransmission timeout expires on a segment in the retransmission
  queue, send the segment at the front of the retransmission queue again
  [and] reinitialize the retransmission timer

That's it! Again, note the focus on 'hey, we gotta get the user's data there
as fast as we can'.

Absolutely no thought given to 'hey, maybe that packet was lost through
congestion, maybe I ought to consider that, and if so, how to respond'. The
later 'if you have a congesitve loss, back off exponentially to prevent
congestive overload' stuff is completely missing (and would be for some time,
probably until Van, I would guess).

I fairly vividly remember being the IETF where Van gave his first talk about
his congestion work, and when he started talking about how a lost packet was
a congestion signal, I think we all went 'wow, that's so obvious - how come
we never thought of that'!

The TCP RFC does, however, spend a great deal of time talking about the window
(which is purely a destination host buffering thing at that point).

Looking at RFC-792 (ICMP) there's a fair chunk in the Source Quench
section about congestion, but little in the way of a solid algorithmic
suggestion:

  the source host should cut back the rate at which it is sending traffic to
  the specified destination until it no longer receives source quench
  messages from the gateway

(And I still think SQ has gotten a bad rap about being ineffective and/or
making things worse; I would love to see some rigorous investigation of
SQ. But I digress...)

The 'Dave Clark 5' RFCs are similarly thin on congestion-related content:
RFC-813, "Window and Acknowledgement Strategy in TCP" (which one would assume
would be the place to find it) doesn't even contain the word 'congestion'!
It's all about host buffering, etc. (Indeed, it suggests delayed ACKs,
which interrupts the flow of ACKs which are an important signal in VJCC.)
And one also finds this gem:

  the disadvantage of burstiness is that it may cause buffers to overflow,
  either in the eventual recipient .. or in an intermediate gateway, a
  problem ignored in this paper.

It's interesting to see what _is_ covered in the DC5 set, and similar
writings: Dave goes into Silly Window Syndome at some length, but there's
nothing about congestive losses.

Lixia's "Why TCP Timers Don't Work Well" paper is, I think, a valuable
snap-shot of thinking about related topics pre-Van; it too doesn't have much
about congestive losses, mentioning them only briefly. The general sense one
gets from reading it is that 'the increased measured RTT caused by congestive
losses will cause people to back off enough to get rid of the congestion'
(which wasn't true, of course).

I haven't read Nagle's thing, but that would also be interesting to look at,
to see how much we understood at that point.

So I think congestion control was so lacking, in part, because we just hadn't
run into it as a serious problem. Yes, we knew that _in theory_ congestion
was possible, and we'd added some stuff for it (SQ), but we just hadn't seen
it a lot - and we probably hadn't seen how _bad_ it could get back then.

(Although experience at MIT with Sorcerer's Apprentice had shown us how bad
congestive collapse _could_ get - and I seem to recall hearing that PARC had
seen a similar thing. But I suspect the particular circumstances of SAS, with
the exponential increases in volume, even though it was - in theory! - a
'single packet outstanding' protocol, might have led us to believe that it
was a pathological case, one that didn't have any larger lessons.)

We were off fixing other alligators (SWS, etc) that actually had bitten us
at that point...

So I suspect it was only when congestive collapse hit on the ARPANET section
of the Internet (shortly before Van's work) that it really got a focus.

	Noel