[ih] Interop as part of Internet History (was Re: Fwd: Fwd: List archives (Was: Exterior Gateway Protocol))

Sat Sep 12 15:27:38 PDT 2020

For some reason my post never made it onto the internet history list 
itself.  Odd.  I hope it shows up eventually.

Seventeen minutes for things to settle down after a route flap? You made 
a good prediction!  By today's standards that's a long time.  But its a 
whole lot better than the hours it often took in earlier days.

(BTW, our internet architecture is lacking a layer, often called 
"association" or "session" that would lay on top of the transport and 
would allow fast application-level healing despite failures of transport 
connections and establishment of replacement connections (such as would 
be common in mobile situations.)  It turns out that such a layer would 
be extremely lightweight and not add a meaningful amount of overhead.  
But because that was an ISO/OSI idea (but done badly) the Internet 
engineering community has not picked up on it.

I worked at Wells Fargo for several years helping them to move into the 
"modern" age of computers and networks (circa 1982). And, like your 
experience, down time meant serious money.  It was amazing how much 
money sloshes through a large bank, especially around 3am when they have 
to meet their reserves requirements - it was a mad time of buying and 
selling huge blocks of money for to cover the reserve requirements for 
the next 24 hours.  If they missed, or if they couldn't get things to 
reconcile - then the regulations would shut the whole thing - the whole 
bank - down almost immediately.

When I got into networking - around 1972 - I was working first on a 
project that closely resembled the movie "War Games" (but with real 
missile launches) and then I moved on to do secure network research for 
the US Joint Chiefs and later, for an unmentionable three-letter-agency 
just south of Baltimore.  So we were not only concerned with failure, we 
anticipated it in the most nuclear-blast-like of forms.  Our reliability 
and response time requirements were based, literally, on whether the US 
could make and deploy a timely "launch/not-launch" decision.  It was 
scary stuff.

The Interop net was unique in many ways.  The time pressure to install 
was immense, the pressure to keep it alive was heavy, and something 
nobody ever mentions - we had to tear it down and get it onto trucks 
back home within a few hours.  We had to invent a lot of things - and 
Dan let us have the leeway to experiment, sometimes destructively or not 
strictly within the limits of "the rules", to get the net up.  (It also 
helped that we put our often large bar tab on Dan's hotel room bill.)

Sometimes we got downright brutal.  For instance, just before opening of 
the first show in San Jose we had an utterly critical box that needed to 
be cabled up - but the access hole was too small.  So ten minutes before 
the doors opened Alex Latzko and I pulled out hammers and proceeded to 
pound the beejeebers out of the existing box's hole.  We metal fatigued 
the steel and got the cables in just as the doors opened.  It was ugly, 
and we destroyed the box, but it worked.

We initially had a lot of trouble with the house electricians. At first 
they didn't mind us hanging our yellow hose Ethernet, but they felt that 
if it got in their way that they could simply cut it and splice it back 
together.  We later had troubles with telco people cutting and splicing 
our carefully balanced long DSL runs across the main rooms of convention 
centers.  It came to a head when union people insisted that we could not 
touch our own fiber optic plant.  I can't remember the details of how we 
resolved that in the short term - I had suggested arguing that fiber 
optics are light "pipes" and that the proper union would be the plumbers 
not the electrical workers.  In the long term we trained a lot of them 
about the right way to do things.  In New York we had to supplement that 
with thick wads of $20 and $50 bills.

I am very concerned that today's networks are very difficult to diagnose 
and repair.   My grandfather was a radio repair guy and my father had a 
business repairing TV's that nobody else could fix.  I kind of inherited 
those genes.   One could tell a good TV guy just by looking at his 
toolkit.  For example, fixing one of those vacuum tube TV's required a 
lot of turning of coils and capacitors via screws on the back and 
looking how the picture changed.  A good TV guy had a little mirror that 
could be propped up in front of the TV so that the picture could be 
viewed while turning the screws.  The bad TV guy kept running between 
the front and the back.

Our tools to detect problems and to make tests are primitive. Back in 
the mid 1990's I built a tool called "Dr. Watson, The Network 
Detective's Assistant" - it was the first internet butt-set designed to 
get a repair person up and running within a few seconds.  it worked well 
but I was not a good manager of my company and it died (parts of the 
tool were picked up by Fluke instruments.)  That tool was intended 
ultimately to be part of a much larger pathology-analysis system based 
on some of our medical systems.

At Cisco I worked on a DARPA backed project to do "smart networks" - I 
was beginning to instrument routers so that they could detect when they 
began to wobble beyond limits established by models.  That detection 
would feed back into the models - often revealing incorrect 
configuration, a degradation, a failure, or, interestingly, a security 
penetration.  That work got short-circuited when I was elected to the 
ICANN board and that absorbed all of my time.  That work has never 
continued but it needs to be resurrected.

I am extremely interested in adopting methods from biology into 
networking.  Living systems are amazingly robust.  The key is that they 
have, over time and evolution, acquired layers of responses to 
situations.  By comparison our computers and networking systems are 
extremely brittle and generally incorporate only one response to any 
situation.   I really want to change that but to do so will require a 
massive change in our mental approach to networking as well as breaking 
some honored traditions, such as strict separation between layers of 
abstraction/protocols.

With my lawyer hat on I keep wondering when the ax of liability is going 
to fall on network operators.  I gave a talk at NANOG last year about 
issues of carrier liability and how we ought to change our approach to 
engineering of the internet to make it more robust (based, somewhat, on 
lessons from biology. ;-)

Here's the video/transcript of that talk:

https://blog.iwl.com/blog/keynote-at-nanog-77-by-cto-karl-auerbach

         --karl--