[ih] UDP Length Field?

Wed Dec 2 11:49:14 PST 2020

Agreed, great summary!

I'd add a couple of observations:

- Although TCP, today, is a byte-stream service, its earlier incarnation
did have record-oriented functionality.   TCP had mechanisms which
delineated different parts of the byte stream, making the traffic flow
essentially a procession of datagram-style chunks called "letters".  
This was apparently helpful to implementors on some kinds of
machines/OSes, making it possible to minimize the number of copy
operations involved in getting incoming data to the right process'
address space.

Bill Plummer (BBN-Tenex et al) was a primary proponent of "letters", and
argued strongly for the inclusion of "end of letter" or EOL flags in the
TCP protocol.  This required a mechanism for manipulating the TCP
sequence space, affectionately called "rubber EOL".   I recall there was
a meeting, shortly after Bill left the TCP project to do other stuff,
where we all decided unanimously to remove the "letter" concept from TCP
entirely and do away with "rubber EOL".   With Bill's absence, no one
was left to object.

Some historian with an interest in sociology might explore how the
technology development was influenced by the comings and goings of the
people involved....my gut feeling is that such events were sometimes
quite impactful.

- Although UDP, and "UDP2" are great ideas, IMHO it's important to note
that such protocols are only one part of a complete implementation.  For
example, UDP defined a "placeholder" datagram service as Dave noted. 
But for UDP traffic to actually achieve low latency non-guaranteed
behavior, the TOS (Type Of Service) field was also placed in the IP
header.  

TOS was intended to convey information about how a particular IP
datagram should be handled, and in conjunction with other information
(such as TTL values) would enable computers handling IP traffic
(routers, hosts, whatever) to tailor queuing, buffering, and
prioritization to be most appropriate for the specified TOS.   A
rudimentary "congestion control" mechanism was define as "Source
Quench", but I don't recall anyone thinking that such a simple mechanism
would work.   Experimentation was needed.

I recall that, back in the late 70s when my group at BBN was responsible
for the "core gateways", there was no difference in handling of IP
traffic driven by TOS.  There were 2 major reasons for this: 1) the
gateway (router) hardware didn't have enough memory to do much more than
the basics, and 2) we didn't know what changes to packet processing
algorithms would be appropriate for different TOS settings.   New
hardware would solve problem 1, and much experimentation, especially
with packet voice, was expected to address problem 2.

>From what I can anecdotally see today, 40 years later, low-latency
datagram service on the Internet is not on anyone's radar.  I helped a
friend investigate his attempts to use a gaming-type app over the
Internet last year, and our experiments discovered that packet loss rate
was surprisingly (to me) measured at 0%, latency was on average in the
hundreds of milliseconds, but had "tails" of data points out to 30
seconds.  The Internet "IP datagram service" today seems to be very
connection-oriented, delivering every packet but with noticeable very
long delays.   I suspect this may be a cause of the anomalies we often
see today in TV interviews conducted using the Internet.

Bottom line - UDP2 is a great idea, but to be effective many other parts
of the overall Internet system would have to evolve as well.   IMHO.

/Jack Haverty

On 12/2/20 9:23 AM, Vint Cerf via Internet-history wrote:
> great summary David.
> v
>
>
> On Tue, Dec 1, 2020 at 9:02 PM David P. Reed <dpreed at deepplum.com> wrote:
>
>> Hi all -
>>
>>
>>
>> I'm glad to be able to try to help. The actual process of designing UDP's
>> packet format was very brief, and was done in the context of sketching out
>> how to split TCP into IP, TCP, and UDP right after that decision.
>>
>>
>>
>> The group doing the design of TCP prior to its split into IP, TCP/IP and
>> UDP/IP (ICMP/IP wasn't contemplated yet), was a combined group, and the
>> efforts were combined. This decision was a quick decision to placate those
>> of us who strongly urged a "first-class" datagram option be part of the
>> Internetworking Experiment, rather than a pure virtual circuit focus as had
>> been pursued up to that point.
>>
>>
>>
>> So UDP was not the focus of the group - in some sense it was a
>> "placeholder" for a more refined datagram design. While the efforts on the
>> IP protocol (including fragmentation and routing-related functions that
>> belonged in the packet forwarding layer) continued, and the efforts on the
>> TCP virtual circuit functionality continued, UDP was sketched, and kind of
>> orphaned with no caretakers polishing it up. There was no "UDP group"
>> formed, as did happen for the IP packet and protocol, separate from the TCP
>> design efforts.
>>
>>
>>
>> This was in the late 1970's.
>>
>>
>>
>> So UDP remained as sketched, until it was finally implemented in various
>> systems as an end-to-end protocol, where ports were used to demultiplex the
>> datagrams. Things got a bit weird when OS's started creating system call
>> interfaces for UDP, because OS's seemed to be stuck in the telephony
>> mindset where circuits were "set up" and "torn down" during a "connection",
>> but UDP defined no connection. That was *intentional* in its concept.
>> Demultiplexing was supposed to be separate from any concept of "connection"
>> to a foreign entity - the whole idea was that a process could own a socket
>> that any foreign system could send to - so A could send a request to B,
>> which would send a message to C, which would send a message to D, and D
>> could confirm receipt and provide a response to A directly, though A might
>> not even know that D existed. Such an idea didn't fit with pairwise
>> connections, but was very important to those of us developing
>> multi-computer decentralized systems (PARC and my group at MIT were
>> examples of places where we were developing fault tolerant
>> process-to-process coordination systems where the fault tolerance was not
>> "in the OS" because we believed OS's would evolve to span loosely federated
>> sets of machines (not mainframes, minis or superminis).
>>
>>
>>
>> That color commentary helps, I think, in understanding the answer to your
>> specific question, which I will suggest shortly. It's important to
>> understand that UDP's implementation was distorted by folks like the
>> Berkeley Unix team who invented "sockets", and it was also frozen in
>> concrete by the OS folks who implemented the sketch. The result was "ok",
>> but not polished. Also, because of the premature solidification, many of
>> the desired goals of UDP were unworkable because "interoperation" had to
>> cope with semantics imposed by relatively clueless OS network stack
>> implementors, like Bill Joy's team.
>>
>> TCP was the star, UDP was a poor relation, though it was not supposed to
>> be - those of us who pushed for it thought it was crucial. (since we
>> accepted that congestion control would be something that had to be
>> addressed by high-level etiquettes that would stanch packet flows based on
>> feedback about conditions of the network observed during packet forwarding,
>> it was disappointing that the congestion control put into TCP was not
>> properly split between IP and TCP, in my opinion. Further making UDP a poor
>> relation - with no coherent theory of congestion management, protocols that
>> used UDP (like RTP) were left to struggle with how to be compatible with
>> Internet level congestion control.
>>
>>
>>
>> So, with all that context (and there's more), to answer the question:
>>
>>
>>
>> 1. UDP's design was minimal and frozen without enough attention being
>> paid. IMO, there is room for an improved UDP, not based on adding features
>> to the packets, but by creating a new protocol number for a "UDP2", which
>> would not, for example, need a redundant length field.
>>
>>
>>
>> 2. The general view back in the day was that it was not the job of the
>> protocol designer to focus on likely bugs in protocol stacks. The length
>> field was redundant w.r.t. the IP header's "Total Length", yes, but
>> redundancy can be checked. There's nothing "wrong" with having redundancy,
>> in other words. It's the OS's responsibility to check. (The same reasoning
>> applies to the redundancy between the IP header's total length and the
>> length of the underlying Ethernet frame or ATM frame or carrier pigeon
>> envelope. Yes, if the network on which IP was overlaid had a length field,
>> too, that provided some redundancy, but redundancy is to be checked, and if
>> there's no match, it is an error, not a security hole.
>>
>>
>>
>> 3. It was thought that across all TCP implementations, headers would be
>> multiples of 32 bit "words" (4 octets). The reasoning was that this was
>> optimal for all kinds of computers that we could conceive.
>>
>> Octets were not "word boundaries" and aligning fields based on octets
>> would have made things quite awkward on machines that did not have byte
>> addressed memory systems. 32 bits fit into the DEC systems (PDP-6/10/20,
>> which many research labs affiliated with ARPA used, and the GE645 and
>> Honeywell 6180 and 68/80 systems had 36 bit word sizes, and so forth. The
>> idea of 8-bit byte addressable memories didn't become popular until 8-bit
>> microprocessors, like the 8080, became important). So clearly the UDP
>> header would be two or more 32-bit words, even though there were only 3
>> 16-bit quantities really needed.
>>
>>
>>
>> 4. The length field is likely to be needed at the endpoints as part of the
>> datagram as delivered by the network stack. TCP didn't have a "length" - it
>> didn't even have packet boundaries at this point. messages within the
>> infinite stream of bytes that were a virtual circuit in TCP would have to
>> have length delimiters to separate messages, but they were not part of the
>> function of the IP header Total Length field.
>>
>> That is to say, the TCP receiving port never needed to see (and would not
>> see) any information derived from the IP header "Total Length". In fact,
>> when TCP retransmitted a range of bytes, the packet boundaries might be
>> quite different - that was crucially part of the design of TCP's semantics
>> as a stream of octets in each direction.  Wheras, UDP sent "user datagrams"
>> that had lengths and a checksum that the *user* was supposed to check (not
>> the operating system, though the Unix sockets guys screwed that up too!)
>>
>>
>>
>> That's the story, such as it is. As one of the advocates of the datagram
>> internet as a primary goal, I think it's a bit sad that the benefit of
>> datagrams as a mode of richer communications has been poorly developed.
>> It's OK, but not great. It's also sad that multi-datagram protocols haven't
>> been developed more maturely - DNS shouldn't require TCP to send longer
>> one-off messages. That's like using a private jet plane instead of a car
>> for a family trip because cars only have 2 seats. You should be able to use
>> 2 cars.
>>
>>
>>
>> Now that we see some of the benefits (in the QUIC concept to replace the
>> heavyweight HTTP/TCP mess) it would be nice to be able to go back and
>> change history. But one cannot.
>>
>>
>>
>>
>>
>> On Sunday, November 29, 2020 7:33am, "Vint Cerf" <vint at google.com> said:
>>
>> the primary proponents of splitting off IP from TCP were Jon Postel, Danny
>> Cohen  and David Reed, I believe. Sadly, Jon and Danny are no longer with
>> us. My recollection is primarily that UDP was to allow for real-time,
>> non-retransmitted, non-sequenced delivery for voice, video, radar in which
>> low latency was more important than sequenced and assured delivery. As to
>> the length field, it may merely have been habit to include, even if the
>> value could have been computed. Sometimes <length> was used to distinguish
>> real data from padding to achieve preferred word boundaries.
>> v
>>
>> On Sat, Nov 28, 2020 at 8:21 PM Brian E Carpenter via Internet-history <
>> internet-history at elists.isoc.org> wrote:
>>
>>> Reverse designing it (a bit like reverse engineering), it seems useful
>>> to be able to check that the intended payload length fits inside the
>>> actual packet length. If it doesn't, you are exposed to what you might
>>> call buffer underrun issues. Conversely, if you don't like covert
>>> channels,
>>> you might want to detect any spare bits after the payload.
>>>
>>> Regards
>>>    Brian Carpenter
>>>
>>> On 29-Nov-20 12:42, Timothy J. Salo via Internet-history wrote:
>>>> Hi,
>>>>
>>>> Can anyone provide some [historical] insight into why the UDP header
>>>> contains a length field?  TCP manages to ascertain the length of data in
>>>> a packet just fine without a length field, so why couldn't UDP?
>>>>
>>>> Several people have noted that the UDP length field is redundant,
>>>> including for example, the current Internet Draft "Transport Options for
>>>> UDP",
>>>> <https://www.ietf.org/archive/id/draft-ietf-tsvwg-udp-options-09.txt>.
>>>>
>>>> There are some other opinions, some of which sound to me like
>>>> after-the-fact reasoning:
>>>>
>>>> - So that UDP can run over network protocols other than IP (although
>>>>    presumably TCP could do this just fine without a length field).  But,
>>>>    the UDP spec says that an IP-like pseudo header needs to be created,
>>>>    in any case.
>>>>
>>>> - Layering and encapsulation reasons, (although, again, TCP seems like
>>>>    a counter example).
>>>>
>>>> - Word alignment, (there were 16-bits left over, so why not use it for
>>>>    the length?).  Personally, this sounds the most likely to me.
>>>>
>>>> Thanks,
>>>>
>>>> -tjs
>>>>
>>> --
>>> Internet-history mailing list
>>> Internet-history at elists.isoc.org
>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>
>> --
>> Please send any postal/overnight deliveries to:
>> Vint Cerf
>> 1435 Woodhurst Blvd
>> McLean, VA 22102
>> 703-448-0965 <(703)%20448-0965>
>> until further notice
>>
>
>