[ih] lack of service guarantees in the internet meaning that it cannot ever "fail"

Jack Haverty jack at 3kitty.org
Tue Oct 27 23:39:56 PDT 2009


On Tue, 2009-10-27 at 20:04 -0700, Ted Faber wrote:
> It
> seems clear that the protocol was designed to allow developers access
> to
> the TCP timeout and retransmission information in this limited way
> when
> it was helpful to them in implementing their application.  The base
> protocol has few extraneous knobs, so I assume this was carefully
> thought out.

Well, I guess I have some legitimate claim to being one of the
designers, since I implemented the first PDP-11 Unix TCP and was
involved in the ongoing TCP and Internet WG meetings and ICCB/IAB way
back when.

Ted's analysis is right on target.  There was considerable thought and
discussion given to timeouts and timing in general.  The specific
assumptions we made about timing (e.g., the maximum lifetime of a packet
as it wandered around the net) had direct impact on design decisions
such as the size of the sequence space, and therefore the number of bits
in the various fields in the TCP and IP headers.  It also drove the
requirement for use of a random number generator to pick an initial
value for the TCP sequence number - if a machine rebooted before all of
its "old" packets were flushed from the net, much confusion could
otherwise result.

TCP/IP was designed to function in a military environment, where nasty
things could happen to the net - partitioning, destruction of nodes,
etc.  The design criteria was that the data should get through if at all
possible, no matter how long it took.  Networks might go away and be
"reconstituted" as new hardware was deployed or moved into range.

The implication of this on the TCP/IP machinery was that it should never
give up - things might get better.  Connections would be broken only if
the "other end" responded and performed a close or abort sequence (RST,
FIN, et al).  If the other end didn't respond, the TCP/IP was to
continue trying, forever, with an increasing length of time between
retransmissions to avoid flooding the net.

The design principle was that only the higher-level application could
legitimately decide that it was no longer worth trying to communicate,
possibly because whatever it was trying to do was no longer relevant, or
because it had additional knowledge that further attempts were futile,
or because it had found an alternative way to accomplish its tasks.  

In order to help the application make the decision whether to keep
trying or abort, the TCP/IP implementation was supposed to make
information about the connection behavior available to the application -
e.g., tell the application that it hadn't heard anything from the other
side for a while (the "timeout"), or that the other side was responsive,
but was not willing to take any more data (the TCP window was closed).

The TCP/IP software was never supposed to close a connection because of
a timeout - it would only close a connection on instructions from the
application that opened the connection, or on word from the other end to
close/reset.

These kinds of timing considerations also caused the protocol itself to
change as we got some field experience.  Early TCP protocol specs have
several fewer states in the state machine than the final TCPV4 because
of that.

The protocol used "across the wire" between conversing TCP/IP
implementations was well specified.  However, if I remember correctly,
the interface ("API") between a TCP/IP and it's user application was
only specified as an example.  There were simply too many different
kinds of operating systems and computer environments around at that time
to define a standard API. IBM 360s didn't look like Unix which didn't
look like Multics which didn't look like Tenex which really didn't look
at all like the Packet Radio OS (I forget it's name...).  DOS, Windows,
and Macs weren't even around yet.

It was left to each implementation designer to provide an appropriate
API which best fit into their particular machine environment.
Personally, the PDP-11/40 Unix environment was so limited (32K memory -
that's K, not M or G), that there wasn't a lot of room for anything
fancy in my API.

Unfortunately, I think that lack of specific standard API resulted in
some TCP/IP implementations that did not provide the intended kind of
information and control through the API, or that decided on their own to
abort a connection that "timed out".

The TCP/IP "service specification" was something like - "Keep trying
until hell freezes over."  So, getting back to the original question -
yes, the Internet couldn't "fail" as long as it kept trying.

HTH,
/Jack Haverty
Point Arena, CA





More information about the Internet-history mailing list