[ih] Monitoring, Operating, Controlling ...

Thu Apr 7 12:45:27 PDT 2022

My grandfather repaired radios; my father repaired TVs.  I grew up 
fixing things; I had my hands inside open, running TVs when I was five 
years old.  (And yes, very early on I learned what it feels like to 
become the best path between a high voltage CRT and ground, and I know 
too well the vibrating feel of a 60hz current running through my body.)

When the Internet management notions came along circa 1990 there was no 
discussion (at least none that I heard) of a difference between 
"management" of a running network and "troubleshooting" of a network 
that was failing to deliver even basic services.

In my TV repair world "management" is the "fine tuning" knob used to 
make small adjustments to receiver frequency to remove the fuzz from a 
channel.

And "troubleshooting" is when there's no picture at all and I pull the 
back off the TV and start looking for burned out tubes or resistors. (Or 
just as often, using my nose to pick up the scent of something burnt.)  
Or, as was sometimes the case, the cotton thread that loops between the 
fine tuning knob and the variable capacitor had slipped off or broken.

In my TV repair guy world there was a big difference between 
"management" and "troubleshooting" and "repair".  But that distinction 
seemed quite lost when we got to "Internet Management".

Even as far back as the late 1980s it was obvious to me that if I could 
not sustain a TCP session between two points on the net the tool kit 
that I would need was not "management" but, rather "diagnosis and repair".

So I was dumbfounded at the assertion in the SNMP world (of which I was 
an early part) that it had to be "UDP all the way" because one could not 
trust that TCP would be usable while by some magic a sequence of UDP 
packets would be.

Hence the horror of SNMP's "get next" and lexi-ordering, and then the 
subsequent horrors of SNMP security models, and now the horror of DTLS 
(datagram TLS) over UDP.

Our company has had an SNMP test suite product for a couple of decades.  
And the most common failure of SNMP implementations is a failure to 
properly perform 'get next".  I've coded in that realm and it is quite 
difficult to get right.  And from the point of view of a management 
client, there is no guarantee that the data obtained through a get-next 
sweep is consistent, there being a possibility that things had changed 
between successive queries in a get-next sweep.

That failure mode is still common today.

I've watched SNMP activity.  It can be a buzz saw of packets racing back 
and forth on a net.  It can sometimes be the dominant flow of traffic on 
a link.

And I've compared it to alternatives based on TCP, such as my rewrite of 
SNMP (https://www.iwl.com/idocs/knmp-overview ) and observed 
packet-count reductions of two orders (or more) of decimal magnitude and 
even greater improvements in reductions of the wall-clock time required 
to make a sweep.  (And I won't mention greatly improved consistency 
between data points.)

In other words, as a "management tool" SNMP is classic camel - a 
racehorse designed by a committee - except that a camel does at least 
some things extremely well.

For doing some troubleshooting jobs SNMP over UDP is useful - such as 
watching whether the packet count of an interface is increasing at a 
explainable rate.

But for doing the "control" part of management, SNMP was and is a dud.  
During meetings at the IETF we would discuss how do we assure sequencing 
of control operations.  Of course we could attempt a "set" and then do a 
readback.  But that's inefficient and slow, and subject to race 
conditions.  Our example was using SNMP to control a bomber.  We'd want 
to make sure that the command "open the bomb bay doors" was received and 
completed before we issued the command "release bombs".  (Sometimes we 
used the example "Is aircraft flying?" and "raise landing gear".)

I hope Netconf turns out better for management and control.  It's been a 
long time since I reviewed its present status, but it began life as a 
kind of remote XML editor that could have had a hard time squeezing into 
IoT devices, particularly ones even smaller than the ESP32 based stuff I 
often work with.  I don't know how much of that fat has been shed.

But on our Internet monitoring, diagnostics, and repair are getting the 
very short end of the stick.  And that stick is getting shorter as we 
impose ever stronger security barriers to block access to fetch data or 
to control a device.

David Isenberg wrote "Rise of the Stupid Network" back in 1997. 
https://www.isen.com/stupid.html

And we tend to differentiate between a control plane and a switching 
plane.  I submit that we need another kind of plane, a monitoring plane 
- sort of a set of smart automated eyeballs that observe the behavior of 
our networks that looks for patterns of ill activity or flows outside of 
predicted constraints.  It is difficult define what would be done when 
these things are observed, and a lot of limits and damping (and human 
gatekeeping) would be needed else we could have created an efficient 
kill-switch for the Internet.

But as the Internet increasingly becomes a utility upon which we depend 
for our health, safety, and well being we need to get far more serious 
about management, monitoring, control, and diagnosis, and repair.

         --karl--