[ih] Why was hop by hop flow control eventually abandonded?

Jack Haverty jack at 3kitty.org
Wed Jul 17 22:07:18 PDT 2013


Well, I'll break the silence here and toss in my two cents (or pfennig or
whatever)...

There was a small group of people involved in creating the first
implemerntations of TCP/IP, back in the 1977/78/79 timeframe.  This was
when TCP/IP evolved from TCP 2.x to TCP 4, with a lot of intermediate
stages over a very short time.  TCP 4 remains pretty much the same today.

I was working at BBN and my assignment at the time was to be the
implementer of the Unix TCP/IP.  So I was in the email discussions,
meetings, late-night napkin sessions at the hotel bars, etc. when a lot of
such "decisions" were "made" - I'll get to that in a bit...

This was the timeframe when some relatively major refinements were made to
TCP, e.g., the separation of the TCP and IP headers.  There was a lot of
discussion about other mechanisms as well -- addressing, the creation of
the "options" mechanism, the shift from "record-oriented" to "byte-stream"
behavior (removal of the "rubber EOL" mechanism), and many others,
including flow control.

At the time, there were two schools of thought.  The traditional methods of
network internals incorporated rather strict control mechanisms within the
network.  For example, the ARPANET (almost ten years old then), was a
"datagram network" at its periphery where the hosts interfaced, but if you
looked at the actual internal mechanisms you'd see lots of resource
management - flow control (RFNMs), memory management (ALLOCs), etc.  That's
the world where techniques like hop-by-hop flow control fit in naturally.
The ARPANET was called a "datagram network", but if you looked just inside
the shell you'd find virtual circuit stuff.

The other school of thought, relatively new, was the "datagram" crowd, who
said that a new way to design the internals of a network would be to take a
hands-off approach and just send individual, independent, chunks of data
into the system, and trust that enough would somehow survive and come out
the other end to be useful.

Of course, internally you could do hop-by-hop flow control in either a
strict or loose fashion.  You could do anything you wanted, as long as you
delivered some packets out the other end.  Good environmentj for
experimentation.

But there was another issue, namely the desire (especially from the guys
paying the bills) to transmit data for which getting most of the data there
quickly was more important than getting all of the data there eventually.
This was motivated by the desire to send real-time material, i.e., voice,
video, etc., where you could tolerate losing a few chunks of data and still
get intelligible audio/video, but you couldn't tolerate getting no packets
because they were held up as some internal control mechanism operated to
get them all delivered intact, but possibly delayed a while.

That desire meant that the internal mechanisms had to provide a way to send
data in a "get as much of this data as you can from here to there, but get
it there quickly" fashion.  This argued against mechanisms such as
hop-by-hop flow control, memory/buffer management schemes, and other
typical "virtual circuit" mechanisms, since they would tend to delay data
in order to make sure it all got delivered intact.

That led to the split of the TCP and IP headers, which enabled the
definition of UDP, which provided a basis for experimenting with voice and
video.

Of course there were some of us (guilty!) who said "Wait, we don't know how
to do that.  We do know how to do it with mechanisms like the ARPANET has
been using since 1970 or so.  Let's do it that way."  The ARPANET did have
a "datagram service" that hosts could use, called (IIRC) "Uncontrolled
Packets", which would be sent through the ARPANET bypassing all of the
internal virtual circuit machinery.  However, to utilize that service, a
Host had to get special permission, and the ARPANET NOC had to enable it in
the specific IMPs involved, for the duration of the experiment.   BBN was
pretty reluctant to do that, since it wasn't clear what effect injecting
such packets would have on the rest of the network traffic at the time.  It
could conceivably bring the whole network down.  No one really knew.  So it
was rarely made available.  The ARPANET had become an operational service.
It was no longer an Experiment.

The other somewhat unusual input to the decisions was from the nature of
the people paying for the whole thing -- ARPA.  ARPA's charter was to do
leading-edge research (that's what the first A is for -- Advanced Research
Projects Agency), and it was actually somehow politically embarassing to
have too many successes.  If you always win, it means you're not taking
enough risks.

So there was an implicit philosophy to do things in a new untried way,
rather than in an old proven way.  We pretty much knew we could do TCP/IP
with a "virtual-circuit" approach (and proven mechanisms such as hop-by-hop
flow control).  We very much did not know if it was possible to build a
workable, reliable system using the untried "datagram" philosophy of
Anarchic Networking (you heard it here first!).

Since the untried approach was riskier, that fit into the ARPA charter
best.  Since there was a strong desire to support voice and video, and in
particular interactive multimedia, that seemed to be difficult with the
controlled philosophy because of the delay issue.  We of course didn't know
that it would ever work in the "Anarchic Networking" schemes either, but
that of course also fit into the "risk taker" category.

Thinking back on it, I think those effects were the primary drivers of the
decisions, including the abandonment of hop-by-hop flow control.

Also, I don't recall *anything* that you might call a rigorous mathematical
analysis and comparison of different techniques, or simulations, etc.,
etc.,   Those decisions were not made by anything remotely resembling
rigorous scientific process.

Remember also, that world at the time operated very much in conformance
with the mantra of Jon Postel -- "rough consensus and running code".  Rough
consensus.  Not decisions.

So, I don't think there ever was an explicit "decision", or at least none
that I remember.  We just kept writing code and trying out different ideas
(Dave Mills was especially prolific and creative in trying things that had
never been tried before, and wreaking havoc with the gateways that my group
at BBN was responsible for (the so-called "core gateways"))

Out of all that came TCP4.

Another interesting situation at the time... The neonatal Internet looked
very much like a "fuzzy peach" model, where the peach was the ARPANET (and
maybe SATNET too), and the "fuzz" were various peripheral networks and
hosts.   All of the people involved in implementing TCP were "host" people,
i.e,. they were writing code that ran in some kind of box that attached to
the ARPANET.  Even at BBN, there were a handful of us doing TCP
implementations, none of whom were "ARPANET guys", i.e., working also on
IMP code.

So, when a bunch of "host guys" try to "decide" how to implement something,
guess where they're most likely to put the mechanisms that need to be built
-- in the Hosts that they can write the code for, of course!

There was a lot of experimentation and new ideas tried and tweaked until
things more or less worked.  My group at BBN had responsibility for the
"operational" part of the initial Internet (Bob Hinden and others), so we
probably experienced the brunt of the experimental fallout.

A good example -- we knew we needed some kind of flow control mechanism, so
the "Source Quench" mechanism was invented.  The idea was that when a
gateway got into trouble and had to discard a packet, it would send a
"Source Quench" back to the source of that packet, telling it to "slow
down".  But no one could say what that meant exactly.  If your TCP sends
its first packet and receives a Source Quench back, exactly what should it
do to "slow down"?

Of course, as a Host implementer of a TCP, you were supposed to somehow
"slow down" in whatever way you could.  But another perfectly accurate way
to interpret that Source Quench was as a message from the gateway informing
you that it had definitely thrown away your precious packet.  Slow down?
Not on your life!  Send that packet again!  Immediately!  The Packets Must
Go Through.  (Dave Mills, are you listening?  I haven't forgotten....)

Such was the experimentation in datagram flow control....

So, in retrospect, I doubt there was ever an explicit decision to choose to
abandon hop-by-hop flow control.  It's more that there was a rough
consensus to try with absolutely minimal internal machinery (even so, IIRC
the first gateways had room after all the code for only a single packet
buffer!).   So the minimalist code got written and tried out, and whatever
didn't work triggered more discussion and consensus about something else to
add (Van Jacobson's seminal work on TCP behavior seems to fit into that
category)

I must admit, I was one of the skeptics who didn't think a pure datagram
approach was going to work, especially as you increased traffic to the
point where resources got tight.  It might work for a while, but it
"wouldn't scale".

Well, it's 2013, and I've heard there's something like 2 billion people
using this beast.  I was wrong, like many others.   How much "scale" do you
want?

Hope this helps,
/Jack Haverty



On Tue, Jul 16, 2013 at 8:55 AM, Detlef Bosau <detlef.bosau at web.de> wrote:

> I'm still to understand this decision. The more I think about it, the more
> I fear, that although the decision to abandon hop by hop flow control
> (which is still in Vint Cerf's catenet paper but is abandoned in RFC 791)
> may be the most seminal generator for PhD theses in the history of science,
> on the one hand, however I'm not quite sure whether this was one of the
> largest fallacies ever.
>
> Actually, the VJCC kludge turned to be quite successful up to now, however
> VJ himself admits some basic limitations in his well known talk "A New Way
> to look at Networking".
>
> Does anyone happen to know, whether this was decision for a concrete
> reason and the rational behind it? Or did this "simply happen"?
>
> --
> ------------------------------**------------------------------**------
> Detlef Bosau
> Galileistraße 30
> 70565 Stuttgart                            Tel.:   +49 711 5208031
>                                            mobile: +49 172 6819937
>                                            skype:     detlef.bosau
>                                            ICQ:          566129673
> detlef.bosau at web.de                     http://www.detlef-bosau.de
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20130717/69bfbf14/attachment.htm>


More information about the Internet-history mailing list