[ih] Monitoring, Operating, Controlling ...
Karl Auerbach
karl at cavebear.com
Thu Apr 7 12:45:27 PDT 2022
My grandfather repaired radios; my father repaired TVs. I grew up
fixing things; I had my hands inside open, running TVs when I was five
years old. (And yes, very early on I learned what it feels like to
become the best path between a high voltage CRT and ground, and I know
too well the vibrating feel of a 60hz current running through my body.)
When the Internet management notions came along circa 1990 there was no
discussion (at least none that I heard) of a difference between
"management" of a running network and "troubleshooting" of a network
that was failing to deliver even basic services.
In my TV repair world "management" is the "fine tuning" knob used to
make small adjustments to receiver frequency to remove the fuzz from a
channel.
And "troubleshooting" is when there's no picture at all and I pull the
back off the TV and start looking for burned out tubes or resistors. (Or
just as often, using my nose to pick up the scent of something burnt.)
Or, as was sometimes the case, the cotton thread that loops between the
fine tuning knob and the variable capacitor had slipped off or broken.
In my TV repair guy world there was a big difference between
"management" and "troubleshooting" and "repair". But that distinction
seemed quite lost when we got to "Internet Management".
Even as far back as the late 1980s it was obvious to me that if I could
not sustain a TCP session between two points on the net the tool kit
that I would need was not "management" but, rather "diagnosis and repair".
So I was dumbfounded at the assertion in the SNMP world (of which I was
an early part) that it had to be "UDP all the way" because one could not
trust that TCP would be usable while by some magic a sequence of UDP
packets would be.
Hence the horror of SNMP's "get next" and lexi-ordering, and then the
subsequent horrors of SNMP security models, and now the horror of DTLS
(datagram TLS) over UDP.
Our company has had an SNMP test suite product for a couple of decades.
And the most common failure of SNMP implementations is a failure to
properly perform 'get next". I've coded in that realm and it is quite
difficult to get right. And from the point of view of a management
client, there is no guarantee that the data obtained through a get-next
sweep is consistent, there being a possibility that things had changed
between successive queries in a get-next sweep.
That failure mode is still common today.
I've watched SNMP activity. It can be a buzz saw of packets racing back
and forth on a net. It can sometimes be the dominant flow of traffic on
a link.
And I've compared it to alternatives based on TCP, such as my rewrite of
SNMP (https://www.iwl.com/idocs/knmp-overview ) and observed
packet-count reductions of two orders (or more) of decimal magnitude and
even greater improvements in reductions of the wall-clock time required
to make a sweep. (And I won't mention greatly improved consistency
between data points.)
In other words, as a "management tool" SNMP is classic camel - a
racehorse designed by a committee - except that a camel does at least
some things extremely well.
For doing some troubleshooting jobs SNMP over UDP is useful - such as
watching whether the packet count of an interface is increasing at a
explainable rate.
But for doing the "control" part of management, SNMP was and is a dud.
During meetings at the IETF we would discuss how do we assure sequencing
of control operations. Of course we could attempt a "set" and then do a
readback. But that's inefficient and slow, and subject to race
conditions. Our example was using SNMP to control a bomber. We'd want
to make sure that the command "open the bomb bay doors" was received and
completed before we issued the command "release bombs". (Sometimes we
used the example "Is aircraft flying?" and "raise landing gear".)
I hope Netconf turns out better for management and control. It's been a
long time since I reviewed its present status, but it began life as a
kind of remote XML editor that could have had a hard time squeezing into
IoT devices, particularly ones even smaller than the ESP32 based stuff I
often work with. I don't know how much of that fat has been shed.
But on our Internet monitoring, diagnostics, and repair are getting the
very short end of the stick. And that stick is getting shorter as we
impose ever stronger security barriers to block access to fetch data or
to control a device.
David Isenberg wrote "Rise of the Stupid Network" back in 1997.
https://www.isen.com/stupid.html
And we tend to differentiate between a control plane and a switching
plane. I submit that we need another kind of plane, a monitoring plane
- sort of a set of smart automated eyeballs that observe the behavior of
our networks that looks for patterns of ill activity or flows outside of
predicted constraints. It is difficult define what would be done when
these things are observed, and a lot of limits and damping (and human
gatekeeping) would be needed else we could have created an efficient
kill-switch for the Internet.
But as the Internet increasingly becomes a utility upon which we depend
for our health, safety, and well being we need to get far more serious
about management, monitoring, control, and diagnosis, and repair.
--karl--
More information about the Internet-history
mailing list