[ih] fragmentation (Re: Could it have been different? [was Re: vm vs. memory])

Wed Oct 25 18:42:19 PDT 2017

Brian E Carpenter wrote:
...
> Now that IPv4 is truly hitting its limits, the main operational complaint
> against IPv6 is that it's too different from IPv4. But the incentives are
> finally shifting in IPv6's favour, like it or not.

i don't even know what i don't like anymore. but for the history books 
that may be written about our era, if indeed we have a future at all:

tony li said that ipv6 was too little, too soon. this was a play on 
words, because the usual complaint is "too little, too late". tony was 
right, even moreso than i realized at the time. we specified a lot of 
things that didn't work and had to be revised or thrown out -- because 
we did not know what we needed and we thought we were in a hurry. we had 
time, as history will show, to spend another ten years thinking about ipng.

where we are today is that fragmentation is completely hosed. pmtud does 
not work in practice, and also cannot work in theory due to scale 
(forget speed-- it's scale that kills.) the only reliable way to 
communicate with ipv6 is to use a low enough MTU that it never exceeds 
any link MTU. in practice that means an ipv6 payload, plus its headers, 
has to fit in an ethernet packet, so, 1500 octets. you can special case 
the on-link LAN scenario so that if you have 9000 octets available you 
can use them -- but that's the only time you can use more than about 
1200 octets for your payload.

this means one of ipv6's major claimed-up-front advantages, which is 
that only endpoints will fragment rather than gateways doing so as in 
ipv4, never came about. in fact, ipv6 is far worse than ipv4 in this 
way, as we learned by using ip fragmentation on UDP/53 (my idea: bad!)

this also means that we're chained to the MTU of the second-generation 
10-megabit ethernet, which was carefully sized to fit a bunch of radio 
spectrum and cable length parameters which have never applied since 
then. but the IEEE 802 people know they're stuck with 1500 forever, 
since no next generation of ethernet can succeed without being able to 
transparently bridge onto the previous generation.

history is hard, let's do math.

> ; (155*10^6) / (53*8)
>         ~365566.03773584905660377358
> ; (10*10^6) / (1500*8)
>         ~833.33333333333333333333
> ; (100*10^6) / (1500*8)
>         ~8333.33333333333333333333
> ; (1000*10^6) / (1500*8)
>         ~83333.33333333333333333333
> ; (10000*10^6) / (1500*8)
>         ~833333.33333333333333333333
> ; (40000*10^6) / (1500*8)
>         ~3333333.33333333333333333333
> ; (100000*10^6) / (1500*8)
>         ~8333333.33333333333333333333

right, so ATM failed in the market for a lot of reasons (state is what 
kills, not speed, like i said) but one of those reasons was that an OC3C 
at line rate was carrying too many cells per second to be able to handle 
all of their headers in then-current or even projected-soon electronics. 
we were wrong, and ATM has been used at OC12C, OC48C, and i've even seen 
OC192C and OC768C, truly a testament to human cussedness fit for a 
bumper sticker or perhaps a t-shirt.

looks to me like less than half a 10GBE is just as bad, and that at 
40GBE and 100GBE it's well beyond absurdity. thankful as we are for 
moore's law, i regret like anything the inability to send large enough 
packets in the WAN so that we don't all need a 100 kilowatt routers to 
handle the headers.

ipv6's mindless and unnecessary early adoption of an unworkable 
fragmentation regime has chained my applications and those of my 
children and their children to the maximum size of a packet in a closed 
10-megahertz radio network. yay us.

-- 
P Vixie