[ih] why did CC happen at all?

Louis Mamakos louie at transsys.com
Tue Sep 2 09:23:10 PDT 2014


On Sep 2, 2014, at 11:45 AM, Detlef Bosau <detlef.bosau at web.de> wrote:

> (These days, I had to think about this funny "UDP light" with checksums
> only referring to the headers, not to the payloads. Dear readers, there
> is exactly NOT EVEN ONE wireless PACKET SWITCHING network without link
> layer checksums, hence, a packet with bit errors is discarded at L2, not
> at L4. So, when one introduces UDP light, the sad truth is: This dead is
> died long ago before a packet ever reaches the receiver's access point
> for "UDP".)

Having been personally screwed by exactly this assumption more than once,
I beg to disagree.

End-to-end checksums are used to protect against software and random hardware
errors anywhere along, not just corruption in transit over communication
links.

1. Early DEC DEUNA UNIBUS ethernet controllers would occasionally screw up
the DMA transfers into memory, and corrupt a received packet when one of
the transfers went wrong.  Fortunately, it didn't randomly spew the
payload elsewhere in memory, but the packet was broken.  I'm sure fancy
new ethernet interfaces with multiple queues would NEVER have this sort of
problem.  This notion of checksum offload on NICs still gives me the willies.

2. Anyone remember HSSI and ATM DXI interfaces on your routers?  This was
early in the deployment of ATM, and you could get this external device 
that was a combination of an DS3 CSU/DSU and an ATM SAR.  There was this
DXI protocol a router could speak to send and receive frames on ATM virtual
circuits.  The external ATM frame shredder would do all of the SAR functions
because the router interfaces didn't exist yet.

Every wonder why IS-IS LSDB entries contain checksums, well this is why..

IS-IS runs on the link layer, assuming that the L1 transport's CRC will 
protect the payload.  Since IS-IS PDUs transit one link directly between
neighbors, this seems like reasonable assumption..

In this case, the HSSI interface had a (I think) 32-bit CRC protecting
the frames between the router and external equipment.  The IS-IS PDU
gets sent across the HSSI link, is verified by the CRC.  The
packet is then chopped up into 53 byte cells, each cell with some CRC.
The cells are spewed across the ATM network, and then received by the
far device.  Each of the cells has the associated CRC validated.  The
cells are reassembled into packets, and then sent over a HSSI link using
DXI encapsulation with a 32-bit CRC.

Imagine you have a network of IP routers, using virtual circuits over
an L2 ATM network for trunking.  They are somewhat richly connected with
virtual circuits betwixt all of the routers.  You are running IS-IS as
your IGP.

Consider what happens when the cell reassembly process goes wrong,  The
packet being reassembled is sitting there in the memory of this device, 
unprotected by an end-to-end checksum/CRC.  Just one cell reassembly goes
wrong and you have a corrupted link state update.  Turns out, the particular
product in question would start corrupting reassembled frames every often
if you pushed too hard, like more than about 34Mb/s.

Corrupting a link state routing protocol database update is *really* bad
news.  Because the updated (corrupted) LSDB entry is then flooded to all
of the neighbors.  Hilarity and disbelief then ensues.  An interesting
failure mode to ponder, at least if its not happening to your network.

And so now there's a checksum option on the LSDB, carried along end-to-end.

So that weak 16 bit UDP checksum can help catch this stuff.  I wonder
how many corrupted files were created with NFS and broken ethernet
hardware?

Louis Mamakos



More information about the Internet-history mailing list