[ih] IPv4 address size debate

Fri Nov 13 09:56:29 PST 2009

Since I was one of the people arguing for fixed-length addresses back in
those days, I guess maybe it's time to explain why...

It all started with the network hardware, i.e., the I/O box that sat
between your computer and the network wire (Ethernet, IMP, whatever).
Your computer, whatever it was, was very slow and very expensive.  Also,
as I recall, none of the early TCP/IP implementors were what you would
call "network people".  We were all operating-system people, and the
network was yet another I/O device to be attached to the computer.

That last observation is important because it dictated design
philosophy, both of code and protocol.  The goal was to get the network
attached as an I/O device while absolutely minimizing the load on the
computer.  Running network code (interrupt handlers, drivers, TCP, etc.)
was overhead, not viewed as useful work.  In some computers that charged
their users for cycles-used, it was "waste" that generated no revenue.
In other contexts it consumed cycles that would otherwise be used for
real work - like playing Zork.   Network I/O had to be tolerated, but
kept to a bare minimum.

In that context, TCP/IP was designed and implemented.  

Starting right at the hardware interface box, the low-level device
driver code in the O/S could typically instruct the hardware to read (or
write) bytes between the wire and memory.  Of course you had to tell the
hardware where to put/get the data, i.e., the memory address.

So, when reading from the net, the resultant data in memory would
contain the physical net header, then the IP header, then the TCP
header, then the "user data".

There was a great desire to avoid moving large amounts of data.
Machines were slooowwww - on the order of 1MHz instruction rate.  With
maybe 20 users on the machine, each had a 50KHz computer -- quite a bit
slower than the 3 GHz machine I'm typing on right now.   Memory was slow
too, and there wasn't much of it.  Every time you had to move parts of a
packet, it was a big load on the processor.  It was very very desirable
to have the hardware put the data into the "right place" for subsequent
processing. 

Depending on the computer architecture, sometimes memory could be
"moved" by simply mapping it into a different address.  But there were
lots of constraints imposed by the hardware, e.g., "block boundaries".

The "holy grail" was to be able to read a packet from the net and have
the "user data" transferred directly into memory that could be simply
handed over to the user program.  If you could do that, no in-memory
copying of large data buffers would be needed, and the O/S overhead was
kept small.

Any variable-length fields in the headers made it very difficult to aim
the incoming data at the right place in memory.   Of course, you could
avoid in-memory copying by reading data in pieces - i.e., read the next
piece of header, look at it, decide how much to read next, etc.  That
however meant multiple interrupts per packet, which was about as bad or
worse than copying in-memory.

With fixed-size Ethernet (or IMP) headers, you could point incoming data
at memory in such a way that the Ethernet header fell into one memory
block, and the rest of the data fell in the adjacent memory block.  That
is of course only if your particular computer and network interface
allowed it.  Then it was easy to simply give only that second block to
the next level.  This was how Ethernet headers were typically "stripped
off".

With variable length IP headers (options) you couldn't play this trick
at the next level.  So "user data" (e.g., Telnet or FTP data) had to be
copied in-memory to pass it up the line.   More complex/sophisticated
TCP/IP implementations might play the odds, and target the incoming data
into the right place hoping that the IP header turned out to be
fixed-size (no options).  If all worked well, the "user data" would end
up on a block boundary, easily passed to the user code.  In the
hopefully rare cases that it wasn't fixed-size, the data would have to
be moved.  IP packets with no options would be fast, those with options
would suffer.

Variable-length addresses would have changed the odds significantly, so
that such a trick would probably rarely work, and packet data would
always have to be moved.  Hence the pressure for fixed-length header
fields, at least for the always-present fields - like the addresses.

The pressure to avoid copying was so strong that it motivated some
unique ideas.  There was at least one implementation of Ethernet (MIT?
IIRC) that put the Ethernet header at the *back end* of the packet on
the wire - i.e., it became a trailer rather than a header.  This
violated the Ethernet spec, but it worked fine as long as both players
understood what was going on.  It allowed incoming user data to be
placed directly in physical memory that could be transferred (not
copied) to the upper protocol software.  I remember at BBN discovering
this trick one day when one of these implementations started sending
such packets to one of our gateways, which of course got continuous
checksum failures trying to interpret the user data as packet headers.

Inside a gateway/router environment, we were always looking for ways to
make things faster and avoid copying and buffering.  I remember toying
in the 80s with the idea of short-circuiting IP packet processing, so
that a packet could be sent out to its next destination *while* it was
still being read in from the previous hop.  The router would read enough
of the header to decide where to send the packet, and then "splice" the
two wires together (in hardware/firmware) to send the rest of the packet
on its way.  This would have required, among other things, redefining
the IP header so that the checksum was kept in a trailer - since you
couldn't finish computing the checksum until all of the data had come
in. 

This would have violated the IP spec, but between consenting routers
anything was possible (and facilitated by the EGP/IGP structure - IGPs
could do their own thing inside their worlds).  Essentially you would
end up with a circuit-switched network that had an IP datagram
interface, so it would have low-delay no matter how many hops were
involved.  In this architecture, IP is an interface spec at the edges
only, and what goes on inside is hidden and could be anything.  I wonder
if today's routers play such tricks....

Of course, now CPUs and memory are dirt cheap, and most of this kind of
data juggling happens anyway inside a chip on a $20 NIC, so it's not a
big deal.  Times have changed.

HTH,
/Jack Haverty
Point Arena, CA

On Fri, 2009-11-13 at 06:13 -0500, Vint Cerf wrote:
> that's a red herring. by the time IP and TCP dealt with the headers,  
> the ethernet portion was stripped away.
> v
> 
> On Nov 12, 2009, at 2:04 PM, Richard Bennett wrote:
> 
> > I remember when handling packets at wirespeed was a challenge, but  
> > that was solved by hardware. The 48-bit EtherMac address was a much  
> > bigger issue than IP addresses, and the size of the Ethernet header  
> > (112 bits) guaranteed that the IP header wasn't going to be 32-bit  
> > aligned anyhow.
> >
> > John Day wrote:
> >> You missed the point of my comment.  I am well aware of the coding  
> >> issues.  Although, Oran and others have always argued that variable  
> >> was not a big deal in hardware.
> >>
> >> The point was that if you think in terms of a relative  
> >> architecture, rather than the traditional fixed flat architecture,  
> >> fixed is variable, or was that variable is fixed?  ;-)
> >>
> >> I was implying that fixed was really all that was necessary, if you  
> >> really understood the inherent structure.  But then you knew that,  
> >> didn't you?
> >>
> >> Take care,
> >> John
> >>
> >> At 1:46 -0500 2009/11/12, Craig Partridge wrote:
> >>> > Once one understands the bigger picture, one realizes that  
> >>> question
> >>>> of variable vs fixed is a non-sequitor. But one does have to get  
> >>>> free
> >>>> of the constraints of a Ptolemaic approach to architecture.
> >>>
> >>> Hi John:
> >>>
> >>> I'm afraid I disagree (at the risk of being lumped in the  
> >>> distinguished
> >>> company of Ptolemy).
> >>>
> >>> I agree that in much of the networking and distributed systems  
> >>> world, variable
> >>> vs. fixed is not a big deal and has all the utility of the binary  
> >>> vs. ASCII
> >>> representations debate (i.e. not much).
> >>>
> >>> But, in routers and encrypters and similar boxes that handle large  
> >>> volumes
> >>> of data, fixed vs. variable is still a challenge.  The fundamental  
> >>> issue is
> >>> that while links work in terms of bits and bytes, processors and  
> >>> memories
> >>> actually work in terms of blocks/chunks.  That's because they use  
> >>> parallelism
> >>> they use to go fast (and one reason they use parallelism is  
> >>> physics -- prop
> >>> times across chip boundaries, etc).
> >>>
> >>> So when writing code for routers that has to go fast, you are  
> >>> constantly
> >>> thinking about those blocks and trying to avoid crossing block  
> >>> boundaries
> >>> (both in instructions and data accesses) and trying to keep your  
> >>> software
> >>> using the minimum number of blocks, as touching an additional  
> >>> block is
> >>> a serious performance hit.   Knowing exactly how your data is laid  
> >>> out
> >>> is a huge boon here -- it removes the uncertainty of how many  
> >>> blocks you'll
> >>> have to touch (and how many instructions you have to execute).
> >>>
> >>> And sizing for the max (assuming the variable address is always  
> >>> max length)
> >>> doesn't help either -- because there are two addresses in a  
> >>> header, if the
> >>> first one is short then all your plans for the second address are  
> >>> undone.
> >>>
> >>> Upleveling my point -- we have a computing abstraction (bytes)  
> >>> which doesn't
> >>> match how computers, when stressed for performance, actually work  
> >>> and that
> >>> has implications for packet headers.
> >>>
> >>> Thanks!
> >>>
> >>> Craig
> >>
> >
> > -- 
> > Richard Bennett
> > Research Fellow
> > Information Technology and Innovation Foundation
> > Washington, DC
> >
> 
>