[ih] ruggedized Honeywell 516 ARPANET IMP cabinet top lifting hooks (Was: The IMP Lights story (Was: Nit-picking an origin story))

Sun Aug 24 18:37:57 PDT 2025

steve, alex, ben, scott, et al. thanks so very much for the
color(ful) esoterica on the "The IMP Lights" reliability history!

yours truly's next esoteric 516 IMP issue is:

what were the 516 ARPANET IMP cabinet lifting hooks for?

in seeing "nuclear survivability" mentioned below in one of Steve's
back-and-forths: it jogged a memory of past where someone once mentioning
-- perhaps in "josh" -- eons ago that the 516 ARPANET IMP cabinet top
lifting hooks were put there so that a 516 ARPANET IMP could "easily be
loaded into a (nuclear) submarine" (!)

sadly, yours truly can't recall where or what mailing list that was
mentioned on... but it got yours truly "thinking" about the
nature/origin/purpose of the 4 516 IMP cabinet top lifting hooks ➔➔
https://iconia.com/516IMP.JPG ?

geoff

On Sun, Aug 24, 2025 at 12:38 PM Steve Crocker via Internet-history <
internet-history at elists.isoc.org> wrote:

> [Apparently the attachment got stripped when I sent this to the
> Internet-History list.  This is a resend, with the text of the interview
> copied directly into this message at the end.]
>
> Folks,
>
> On Monday, 18 August 2025, I described how the lights on the IMPs often
> burned out and caused a noticeable amount of downtime on the IMPs.  Geoff
> Goodfellow asked for more details.  That exchange is copied below.
>
> I learned of the problem with the IMP lights during a virtual roundtable
> with Ben Barker and others.  We published the roundtable in [1].
>
> I later interviewed Ben and Scott Bradner to learn more details. The
> interview [2] is attached. appended.
>
> In the process of checking, Alex McKenzie sent me a more recent article
> Dave Walden, he and Ben wrote which covers several incidents related to
> reliability, and he sent a reference to the article to the list.  See [3]
> below.  I also learned that Ben passed away two years ago.  I'm sad.  He
> was a delightful and always positive guy.
>
> After further discussion with Alex, we agreed [1] has the least detail.
> [3] is best, but it's behind a paywall.  The interview [2] is a
> close second.
>
> I think this is all the information that's available.
>
> Thanks to Ben for the delightful story, to Geoff for asking for the
> details, to Scott for permission to use the interview, and to Alex for the
> recent article and advice on how to proceed.
>
> Steve
>
>
> [1]  "The Arpanet and Its Impact on the State of Networking," Stephen D.
> Crocker, Shinkuro, Inc., Computer, October 2019.  This was a virtual
> roundtable with Ben Barker, Vint Cerf, Bob Kan, Len Kleinrock and Jeff
> Rulifson.  Ben mentioned the problem with the IMP lights.  It's only a
> small portion of the overall roundtable.  The next two references have more
> detail.
>
> [2] "Fixing the lights on the IMPs," an unpublished interview with Ben
> Barker and Scott Brader, 3 July 2020.  It's attached appended.
>
> [3] "Seeking High IMP Reliability in the 1970' ARPAnet" by Walden,
> McKenzie, and Barker, published in Vol 44, No 2 (April - June 2022) of IEEE
> Annals of the History of Computing.
>
> ---------- Forwarded message ---------
> From: the keyboard of geoff goodfellow <geoff at iconia.com>
> Date: Mon, Aug 18, 2025 at 8:54 PM
> Subject: Re: [ih] Nit-picking an origin story
> To: Steve Crocker <steve at shinkuro.com>
> Cc: John Day <jeanjour at comcast.net>, Dave Crocker <dhc at dcrocker.net>,
> Internet-history <internet-history at elists.isoc.org>, <dcrocker at bbiw.net>
>
>
> [I] am innately curious about the ARPANET "The IMPs Lights Reliability
> Issue" you mention here and wonder if some additional color could be
> elucidated to the colorful story as to just HOW "the lights on the IMP
> panel being a major source of outages" and specifically what
> "re-engineering" was effectuated to ameliorate them from crashing the IMPs?
>
> On Mon, Aug 18, 2025 at 7:22 AM Steve Crocker via Internet-history <
> internet-history at elists.isoc.org> wrote:
>
> > ... Ben Barker has a colorful
> > story about the lights on the IMP panel being a major source of outages.
> > The IMPs had a 98% percent uptime at first.  98% was astonishingly good
> > compared to other machines of the day, but intolerably poor in terms of
> > providing an always available service.  Ben re-engineered the lights and
> > brought the reliability up to 99.98%.  How's that for a small thing
> having
> > a big effect!
> >
>
> Fixing the lights on the IMPs
>
>
>
> Below is an exchange with Ben Barker, stimulated by a comment by Scott
> Bradner.  I had been talking to Scott about another project but I opened by
> asking a bit about his early years.  He worked at Harvard in several
> capacities over many decades.  He started as a programmer in the psychology
> department.  Ben Barker had been a student at Harvard and later joined
> BBN.  Ben was a hardware guy.  Scott mentioned Ben hired him to develop
> circuit boards for the front panels of the IMPs.  Ben participated in a
> virtual roundtable last year, and I recall his vivid story of improving the
> reliability of the IMPs.
>
>
>
> For context, the following details are alluded to but not explained.
>
>
>
> ·    The first several IMPs were built using ruggedized Honeywell 516
> computers.  I believe the cost to ARPA was $100K each.  Production later
> shifted to using regular, i.e. not ruggedized, Honeywell 316 computers.  I
> believe this dropped the cost to $50K each.  I believe the architecture was
> identical but probably slightly slower.  Apparently, speed wasn’t an issue,
> so this change saved a noticeable amount of money.  Also, as Ben makes
> clear, there were some unfortunate changes to the front panel that may have
> saved some cost in the production but were problematic in operation.
>
> ·    The software in the IMP included the routines for receiving and
> forwarding packets and retransmitting if they hadn’t been received
> correctly.  It also included the distributed algorithm for computing the
> routing tables based on periodic exchange of tables with neighboring IMPs.
> In addition to these primary functions, there were also four background
> processes that implemented “fake hosts,” i.e. processes that were
> addressable over the network as if they were hosts.  Each one implemented a
> specific function.  One was DDT, the standard debugging technology of the
> day.  I say “standard” because I had encountered other implementations of
> DDT while at MIT.  I have no idea whether similar software was used in
> other environments, but the concept was both simple and powerful, and I
> assume it would have been copied widely.  In brief, it’s an interactive
> program that has access to the memory of one or more processes.  There are
> commands for starting and stopping the execution of the subject process,
> examining or changing memory and setting breakpoints.  AI Memo AIM-147 by
> Tom Knight in 1968 describes DDT for the MIT AI Lab PDP-6.  An earlier 1963
> memo, Recent Improvements in DDT, by D. J. Edwards and M. L. Minsky makes
> it clear DDT had been around for several years.
>
>
>
> See my comments after the exchange.
>
>
>
> Date: Jul 3, 2020, 2:37 PM
> From: Steve Crocker <steve at shinkuro.com>
> To: Scott, Ben, me
>
> Ben,
>
>
>
> I was chatting with Scott about his early days.  He mentioned doing the
> circuit boards for the IMP front panels, and I recognized it was part of
> the same story you told me about fixing the lights.
>
>
>
> Scott,
>
>
>
> Thanks for your time today.  You mentioned doing the circuit boards for the
> IMP front panels.  As I mentioned, I listened to Ben Barker vividly
> describe how this made a big difference in reducing the number of service
> calls and improving the uptime of the IMPs.  Ben participated in a virtual
> roundtable last year that was published in the IEEE's Computer magazine.  A
> copy is attached.  Ben mentions reliability in both his brief bio and later
> on page 22.  I've copied the text from that page.
>
>
> *BARKER: *Reliability surprised me. I was surprised to find that the
> reliability requirements for a network are much more extreme than the
> reliability requirements for a computer, even for the computer upon which
> the network switch is built. When we first started operating the Arpanet,
> on average each node was down about a half hour a day. And people were
> saying, the net’s always down and there was talk of canceling it, shutting
> down the net because it just wasn’t good enough to be useful. We had a
> meeting with our subcontractor for hardware to present our complaint to
> them that nodes were down a half hour a day. Their reaction was, “You’re
> down a half hour out of 24. That’s about 2%. You’re getting 98% up time.
> We’ve never gotten 98% uptime on anything we’ve built. How are you doing
> it?”
>
>
> Eventually, I took over field operations of the network. They thought it
> was strange to have a Ph.D. running a field service operation, but, you
> know, we were weird guys. But little by little, we chipped away at it over
> the course of a year and a half. We got the availability from 98 to 99.98%,
> and the user community reaction went from “the net’s always down” to “the
> net’s never down.” But that change is something that would have been
> written into the spec if they were talking about that kind of application,
> nuclear survivability.
>
>
> Ben doesn't mention the lights in the article but I definitely remember him
> describing this to me.  It might have been in a separate conversation that
> wasn't recorded.
>
>
>
> Cheers,
>
>
>
> Steve
>
>
>
> hat
>
>
>
> Date: Fri, Jul 3, 4:01 PM
> From: Ben Barker
> To: me, sob at sobco.com
>
> Indeed!  And hello you old SOB.  How have you been doing the last half
> century?
>
>
>
> Honeywell built the 516s and 316s using low-voltage incandescent bulbs for
> the displays.  On the ruggedized 516s, they were in sockets with a screw-on
> clouded cover.  Not too bad.  On the 316, the bulbs were mounted inside the
> momentary contact rocker switches that were used to input data.
> Unfortunately, these switches were actuated by pressing and releasing them,
> allowing the switch to pop back to the resting position.  There was a
> strong spring pushing the switch back out resulting in a pretty strong
> mechanical snap on release.  More unfortunately, this mechanical shock was
> simultaneous with the bulb turning on – the inrush of maximum current into
> the cold filament – ideal conditions and timing for burning out a bulb.
> More unfortunately yet, the bulbs were mounted in the switch by soldering
> the leads to the connector.  This meant that the bulbs burnt out very
> frequently and a dead bulb required taking the IMP down, disassembling the
> front panel, unsoldering the dead bulb, and soldering in a new one.
>
>
>
> But wait!  It gets worse!  The IMPs were fragile: once a machine was taken
> down, it typically took hours – sometimes days – to get it back up.
>
>
>
> I asked Scott to come up with a design that would replace the switches and
> bulbs, using red LED bulbs in their place.  Scott found a switch / LED
> combo that fit just right into the holes in the 316 front panel and
> designed a PC card that carried the switch / lights in just the right
> places to fit in. Scott went into production and we retrofitted them in all
> the 316s in the field. Down time dropped amazingly.
>
>
>
> The other half of the strategy was dealing with the fragile IMPs that took
> a long time to bring back up.  Scott’s switch / light panel was the first
> big step in eliminating probably the majority of the times we had to take
> the IMPs down.  The next was stand-up PMs – leaving the machines up and
> running while performing preventative maintenance – mostly cleaning the air
> filters and checking and adjusting the power supply voltages and performing
> a visual check.  It eliminated one IMP down episode per IMP per month and
> helped enormously in eliminating the extended effort to bring the machines
> back up.  I have recently been informed that this is a known phenomenon
> known as the “Waddington Effect” from eliminating unnecessary PM on world
> war 2 bombers producing a 60% increase in the number of effective flying
> hours.
>
>
>
> The third leg of the stool was using self-diagnostic and remote diagnostic
> techniques to find problems early on before the network users were aware of
> a problem and scheduling a tech to go out to replace an already-identified
> card, typically off-hours when nobody from that site was using the net.
>
>
>
> Sorry to ramble…
>
>
>
> /b
>
>
>
>
>
> Date: Jul 3, 2020, 4:15 PM
> From Steve Crocker
> To: Ben, me, sob at sobco.com
>
>
>
>
>
> Very cool stuff.  Question: what sorts of things could be diagnosed
> remotely?
>
>
>
> Steve
>
>
>
> Date: Jul 3, 2020, 6:49 PM
>
> From: Ben Barker
> To: me, sob at sobco.com
>
> [image: https://mail.google.com/mail/u/0/images/cleardot.gif]
>
>
>
>
> Mostly it was figuring out how to find problems that had brought one or
> more IMPs down previously.  One was an IMP that was just running slow.  We
> figured out to check a count of how many background loops the machine would
> complete in a given length of time – a minute? Easy to do with a program on
> our PDP-1 reaching out through the IMPs DDT to read the register that was
> incremented by the background loop.  If something was keeping the machine
> inordinately busy, it would show up as low loop counts.  If it was low, we
> would check the return address of the interrupt routines which would show
> us what the machine was doing the last time the interrupt happened.  Then
> there was just debugging using the DDT.
>
>
>
> We had a machine that was confused about time.  It turned out its Real Time
> Clock card was bad.  I wrote a PDP-1 routine called RTCHEK that would
> trivially check an IMP’s RTC.
>
>
>
> There was the infamous Harvard crash wherein a memory failure in the area
> used by the IMP to store its routing tables. John McQuillan modified the
> IMP code to checksum the table before using it.  Told us instantly of
> memory failures in that stack.
>
>
>
> The modem interfaces generated and checked 24-bit checksums on all packets
> on the line. We sometimes would get packets that passed the checksum check
> but whose contents were in error.  We started having the IMPs send such
> packets to a teletype in Cambridge where we would print them out in octal
> and I would examine them.  The most common packets were routing packets and
> they were very stylized.  Sometimes a given bit would not always get
> recorded in memory properly and it would be clear which one from looking at
> the packet.  If it was a problem in the input or output shift register, it
> would show up on a given bit and on the bits to its left.  More typically
> it was a problem gating the data onto the data bus.  In any case, you could
> pretty well identify which gate was failing and schedule out service tech
> out to replace that card at night.
>
>
>
> At times, we would patch the IMP to software checksum the packets on the
> line to find out if the check registers were failing.  At times we would
> turn on the software checksum and turn off the hardware checksum to see
> problems in the AT&T equipment.
>
>
>
> These are random examples.  We did lots of such.  Mostly all done from our
> PDP-1 DDT.  It was pretty cool.
>
>
>
> [image: 😊]
>
>
>
> /b
>
>
>
>
>
> Date: Fri, Jul 3, 7:27 PM
> From: Steve Crocker
> To: Ben, Scott
>
>
>
> Cool stuff. Did you guys ever write up these details? A bigger question is
> whether the techniques you developed ever influenced or got used by
> others. I
> would imagine that virtually every network operator and router vendor
> needed to develop similar tools.
>
>
>
> Thanks,
>
>
>
> Steve
>
>
>
> Date: Fri, Jul 3, 7:32 PM
> From: Ben Barker
> To: me, sob at sobco.com
>
> 1 – No. This thread is the most detail I know of.
>
>
>
> 2- Not to my knowledge.  I believe that I was told that DECNet later
> incorporated remote diagnostics, but I don’t think they had something like
> the remote DDT in the switch that was the basis for most of what we did.
> But I am only speculating here.
>
>
>
> Reflective comments
>
>
>
> ·      These details bring a bit of life into the seemingly ordinary
> process of fielding IMPs.
>
> ·      The BBN IMP group was small, smart and agile.  Most were software or
> hardware engineers who had been at MIT, Harvard and Lincoln Laboratory.
>
> ·      Barker says the improvement from 98% uptime to 99.98%, i.e. a
> reduction of downtime from 2% to 0.02%, which is the hundredfold
> improvement Barker refers to in his bio section of the virtual round table,
> made a qualitative difference in the perception within the user community
> about the reliability of the network.  This speaks directly to the dual
> nature of the project, i.e. a research project to some like Kleinrock and
> Kahn, versus a durable platform for others to build applications upon and
> get work done.
>
> To press this point a bit further, there were several time-sharing projects
> pursued with IPTO support.  These ranged from Multics at the high end down
> to GENIE at Berkeley, Tenex at BBN, ITS at MIT and various others over the
> years.  IPTO didn’t put all of its eggs into any single basket.  Some of
> these, particularly GENIE on the SDS 940 and Tenex on the PDP-10, were
> adopted by others in the community and became workhorses.  In the network
> arena, however, IPTO did not sponsor multiple projects.  Hence, there was
> more emphasis for the Arpanet to be usable system and not just one of
> several possible systems.
>
> ·      The learning curve Barker describes is not a surprise.  It’s exactly
> what’s to be expected once an initial system is put into operation.
> However, the fact these techniques were not documented and promulgated
> suggests either or both of:
>
> a.     Although the BBN group published multiple papers about their work,
> there may have been less publication than there would have been in a
> university.
>
> b.     The remote debugging and other aspects of improving the reliability
> might not have seemed special enough to be worth publishing.
> --
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history
>
>
-- 
Geoff.Goodfellow at iconia.com
living as The Truth is True