[ih] history of protocol bugs

Fri Nov 10 18:44:07 PST 2023

Yes, correct. Looking for something that works.

The spec ended up with 2 seconds, which may not be enough but one can use different port-ids to get around it and leave used port-ids fallow even longer.

Part  of the Lsolution was TTL in IP to ensure that the packets were no longer in the network.

The ultimate solution came in 1978 when Richard Watson proved that the necessary and sufficient condition for synchronization for reliable data transfer was to impose an upper bound on 3 times:
Maximum Packet Lifetime, MPL;
Maximum Time to Wait before sending an Ack, A;
Maximum Time to Exhaust Retries, R.

Watson calls the quantity: MPL+A+R, delta-t.  After no traffic for 2 or 3 delta-t, there are no packets in the network relating to the state. Any initial sequence number can be used. A bit, called the Data Run Flag, is set in the header of the protocol to indicate this is a new state regime.

TCP/IP imposes the first, but the other two are implicit in the assumptions about performance. Waiting 2 seconds to re-use the port-ids after a connection has closed seems to be sufficient to avoid problems, and probably these days is much more than enough. (From what I hear, some implementations are running through the port-id space in less than 2 seconds and nothing bad has been reported, or at least I haven’t heard of it.)

I haven’t looked closely enough at the TCP Timestamp Option (how sequence number rollover is protected) to see if it obeys this. It probably does for normal operation. The only one I would be concerned about is Selective Ack. I do know that simulations showed that a protocol that implemented Watson’s bounds was found to be more robust under bad conditions than TCP.

Take care,
John

> On Nov 10, 2023, at 18:18, Jack Haverty via Internet-history <internet-history at elists.isoc.org> wrote:
> 
> Part of the requirements spec of the protocol was that it be able to handle whatever the network did to datagrams in transit.  That included delaying packets.   A functioning TCP connection might receive an old datagram, with address and sequence number that happened to be valid for the current connection, but with quite different data.   You can never be absolutely sure that "the last use has been Acked."
> 
> That could occur when a datagram from an "old" connection was still bouncing around somewhere in the 'net and finally came to its destination.  In the early days, when TCP implementations tended to start all connections with ISN == 0, it was pretty common to receive "old" datagrams that caused errors in the output.
> 
> IIRC, we struggled with how to best handle such possibilities.   One approach was simply to assume that there was a maximum latency of the Internet, and require TCP implementations to simply sleep on startup until any old datagrams were assumed to be gone.   One value that passed the consensus test was 3 seconds, so TCP implementations at that point would just wait a while when first started.   We weren't thinking about an interplanetary Internet at the time, but we did have satellite networks (SATNET, MATNET, WBNET) with associated delays.
> 
> The 3-second delay didn't solve the computer-crashed-and-restarted case, so the protocol was changed again to require a random ISN.  Of course that isn't perfect either, since there's still some chance that a new connection will ovrlap with an old one.   But we decided that such an event would be probabilistically small.  Besides, the Internet was an Experiment, and a better mechanism could be put in the next version of TCP.   Similarly, the checksumming algorithm was purposely selected to be friendly to the overworked computers running TCPs, rather than to be robust as an error handling scheme, and a better algorithm could be introduced later.
> 
> There were also a lot of distinctions between Required and Recommended parts of the protocol.  Very few things were Required. That allowed for a lot of experimentation with retransmission algorithms, packetization approaches, et al.  IIRC, the "window" was defined to be advisory - so it wasn't a protocol violation to send data "outside the window".  The receiver might just discard it, but perhaps by the time the datagram arrived the window would have moved and the datagram accepted.   That was an experimental technique to perhaps achieve improved throughput.
> 
> All of the above happened over a very short period of time - somewhere between a weekend and a few months.  So if it was captured anywhere in print, it would have been in emails or meeting notes, not likely in RFCs or IENs.
> 
> Jack Haverty
> 
> 
> On 11/10/23 13:16, John Day via Internet-history wrote:
>> I agree with you. The same Sequence Number can’t be assigned unless the last use has been Acked.
>> 
>> But to do that, the implementation would have also had to have been ignoring the Credit and not to send beyond the Right Window Edge. The implementation wasn’t obeying the protocol.
>> 
>> The fact that the sequence number space rolled over too soon and the sender couldn’t keep the pipe full was not a bug in the protocol. That it was not what was desired was a different problem. That led to an ‘enhancement.’ Of course a problem that easily predictable.
>> 
>> (It didn’t need interplanetary communication to run into the problem. I remember doing this calculation for a satellite connection early on (last half of 70s) and realizing that it would be a problem with byte sequencing.)
>> 
>> Take care,
>> John
>> 
>>> On Nov 10, 2023, at 13:43, Steve Crocker via Internet-history<internet-history at elists.isoc.org>  wrote:
>>> 
>>> I agree with Craig and Dave that #1 was an implementation bug and #2 was a
>>> specification bug.
>>> 
>>> #3 is a "bug" of a different order.  Craig says it's a growth issue.  Dave
>>> says it's an enhancement.  It's clear the sequence space was too small to
>>> support the kinds of delays that would be encountered in interplanetary
>>> communication.  Thus, it would be fair to say this wasn't a bug, but simply
>>> a limitation on the environments in which TCP would work.
>>> 
>>> Essentially all tools have limitations on how they're used.  That said,
>>> I've always thought it was a weakness in the specification and
>>> documentation of protocols that the quantitative aspects are usually not
>>> addressed.  The tuning of timeouts, limitations on capacity, etc. are
>>> usually left to the implementers and operators to figure out later.
>>> 
>>> Thus, if there's a bug in #3, I'd say it was in not making the limitations
>>> explicit in the design and documentation.  Thus it was not a bug that the
>>> TCP sequence space didn't support interplanetary communication.  If there
>>> was a bug, it was, at worst, merely that anyone had in mind to use it for
>>> that purpose.
>>> 
>>> Steve
>>> 
>>> On Fri, Nov 10, 2023 at 4:05 AM Dave Crocker via Internet-history <
>>> internet-history at elists.isoc.org> wrote:
>>> 
>>>> On 11/10/2023 4:50 AM, Craig Partridge via Internet-history wrote:
>>>>> Which of these bugs (or kinds of bugs) do you want to track?
>>>> RFC Errata are required to be deviations in the specification, from what
>>>> was intended by the authors.
>>>> 
>>>> This draws a distinction from things that might be called
>>>> 'enhancements'.  A bug is a behavior that was not originally intended.
>>>> An enhancement is a change in intention.
>>>> 
>>>> So...
>>>> 
>>>>> 1. 20 years ago, a software vendor shipped code that computed the wrong
>>>>> checksum on a FIN-ACK if the FIN-ACK had to be retransmitted.
>>>> bug
>>>> 
>>>> 
>>>>> 2. In 1974, Ray Tomlinson
>>>> ...
>>>>>   He realized that
>>>>> TCP needed a way to select initial sequence numbers that prevented old
>>>>> segments from being confused with new segments.
>>>> bug.
>>>> 
>>>> 
>>>>> 3. Around 1990, people realized that the TCP sequence number space was
>>>> too
>>>>> small for gigabit links and a TCP option was developed to expand the
>>>>> sequence space.
>>>> enhancement.
>>>> 
>>>> 
>>>> d/
>>>> 
>>>> --
>>>> Dave Crocker
>>>> Brandenburg InternetWorking
>>>> bbiw.net
>>>> mast:@dcrocker at mastodon.social
>>>> --
>>>> Internet-history mailing list
>>>> Internet-history at elists.isoc.org
>>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>>> 
>>> -- 
>>> Internet-history mailing list
>>> Internet-history at elists.isoc.org
>>> https://elists.isoc.org/mailman/listinfo/internet-history
> -- 
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history