[ih] Intel 4004 vs the IMP
Jack Haverty
jack at 3kitty.org
Mon Nov 15 21:43:20 PST 2021
Yep, you're right that the IMP had I/O by DMA. But it wasn't just used
for performance needs (those IMP guys were very clever...)
It's well documented that the IMP had the ability to reload its program
from a neighboring IMP, and even to deploy new releases of an IMP by
successive reloads until the whole network was running the new release,
and to do that without disrupting traffic flows. Kind of like open
heart surgery...without losing the patient.
As part of that patent dispute, I had to figure out *how* it did those
things and describe it at an instruction-by-instruction level. That's
all in a report I submitted but I don't think it's publicly available
(talk to the judge, not me). The ability to do those things relied on
the behavior of the DMA, interrupt mechanisms, CPU, and some custom
add-on hardware developed by BBN and/or Honeywell.
The IMP custom hardware included timer functions, and in particular a
"watchdog timer" which the program could start. If the timer ever ran
down, the processor would get an unblockable interrupt and be forced to
execute in the interrupt handler. That capability was used primarily
to recover from software failures. If the program didn't reset the
watchdog soon enough (it was in timeframes of seconds, not
milliseconds), the interrupt would fire and force the PC to execuite the
interrupt handling code. So the IMP program was designed to reset the
watchdog frequently as part of its "main loop". If everything went
well, the watchdog timer would never run out.
If the watchdog interrupt handler ran, it would assume that something
was seriously wrong, and initiate a reload of the entire IMP program
from some other IMP. There were other ways to cause a reload from a
neighbor. The network operator could command the IMP to reload.
Consistency checks performed periodically in the software, e.g.,
detecting that something that "should never happen" had in fact
happened, would also be handled by jumping to the reload code. There
were quite a few situations which would trigger a reload; I probably
only found some of them.
Once the IMP was executing the reload code, it would block all
interrupts. There might still be I/O in progress, but when it completed
nothing would happen. Instead, the IMP would pick one of its neighbors
by random choice, prepare a "please reload me" packet, and issue an I/O
write on the appropriate interface to have the DMA send that packet out
the wire to the chosen neighbor. It would then execute an I/O read
instruction on that same interface, specifying that whatever came in on
that interface next should be placed in its memory, overwriting the
whole area in which the running IMP code was kept. Having done that,
it would drop to an instruction that just did a single-instruction loop
(i.e., a "JUMP <here>").
The IMP would hang forever executing that instruction -- but the DMA and
watchdog are still active. And the neighbor IMP is hopefully receiving
the "reload me" request and then issuing an I/O write to send its entire
code space of its memory out the wire to the other IMP which has an I/O
read active to receive it and overwite its own whole code memory.
Effectively the machine is now locked in a tight loop, with all
interrupts disabled except for the one associated with that read
operation ... and the watchdog timer. And the I/O is still happening.
When the I/O interrupt finally fired, the handler would examine the
pointers from the I/O to see if it finished cleanly, and if so JUMP to
the starting address of the IMP code, pretty much as if it had just been
powered on. If something looked amiss, e.g., the I/O transfer was too
short to have delivered the whole IMP code space, the handler would pick
another random neighbor IMP and go back to the reload code to get a
memory image from that one instead. If the watchdog interrupt happened
before the I/O completed, something was also wrong, and it would
similarly try a different IMP to see if it could get a good reload.
Today, it would probably also post a scathing Yelp review about that
other IMP that failed to deliver.
Of course, the code coming from a neighboring IMP might be a newer or
older software release. Things in memory tend to move around when
software gets revised, and the running IMP code was being overwritten by
the DMA -- as it was running. So at some point, the instruction that
the IMP was executing would be overwritten by whatever instruction lived
at that memory address in the IMP code release coming from the
neighbor. It seems that this was handled by programmer discipline,
i.e., making sure that the "JUMP <here>" instruction was present at the
same memory address in every core image that might be in use in the
ARPANET at the time. I surmised this when I found a small fragment of
code that it appeared could never be executed. You couldn't get there
from anywhere else in the code. But, in a *different release of the
code* that's likely where the "JUMP <here>" instruction was stored. So
when the DMA overwrote the instruction the IMP was currently executing,
the instruction didn't change at all, regardless of which releases were
in the sending and receiving IMPs.
When I figured this all out, I concluded that the IMP designers were
indeed very clever. But as far as I know, what I wrote above has never
been documented elsewhere. Not even as comments in the code.
So, when I suggested that it was important to look at more than CPU
speed, memory capacity, et al, this is the kind of functionality I was
thinking about. Steve's question about a microprocessor being able to
do what an IMP did means being able to somehow implement similar I/O,
timer, interrupt, DMA, and any other "clever" behavior of the H516-based
IMP, using some combination of add-on hardware (I/O, timer etc.) and an
instruction set and processor behavior that supported whatever "clever
programming" was needed to get functionality such as the reload and/or
software upgrade capability, and perhaps other such stuff I didn't
encounter since I only looked at one aspect of the IMP code.
Could the 4004 do it? I have no idea, someone who knows a lot more
about ancient microprocessors would have to figure out how they might do
what the IMP did.
Enjoy,
/Jack Haverty
On 11/15/21 6:58 PM, Noel Chiappa via Internet-history wrote:
> > E.g. a register-register MOV took 1.7 microseconds, but a memory-memory
> > move .. seems to have taken 1.7+4.9=6.6 usec, so about 1/5th of a MIP
>
> Ooops, I mistakenly was in the -11/23 timing appendix, not the -11/03; the
> latter was considerably slower: 3.5 usec basic, 2.5(sic)+9.1=11.6 usec.
> (Indirect was even slower.) So about 1/10th of a MIP.
>
>
> > From: Jack Haverty
>
> > E.g., to handle I/O on a serial interface, the CPU might have to take
> > an interrupt on every byte
> > ...
> > I think that kind of issue is the one Alex referred to about selecting
> > the 316 because of its interrupt mechanism. ... how the hardware
> > handled I/O on all those interfaces was a crucial factor in selecting
> > the 516.
>
> Yes and no. The IMP's modem and host interfaces were both DMA (in the sense
> that the CPU only got a single interrupt - which diverted instruction
> processing in the CPU to other instructions - for every packet, not on every
> word; the details differed significantly from modern DMA, though - see
> below). But there was a timing issue.
>
> Per 'The interface message processor for the ARPA computer network',
> available here:
>
> https://www.walden-family.com/public/1970-imp-afips.pdf
>
> "To send a packet, the IMP program sets up memory pointers to the packet and
> then activates the interface ... The interface takes successive words from
> the memory using its assigned output data channel and transmits them
> bit-serially (to the Host or to the modem). When the memory buffer has thus
> been emptied, the interface notifies the program via an interrupt".
>
> The details are intetesting: the IMP used "a set of 16 multiplexed channels
> (which implement a 4-cycle data break)". This is through a device called the
> DMC, the 'Direct Multiplex Control' (sometimes 'Data Multiplex Control').
> Notice the "4-cycle data break"; consulting the DMC manual (also online),
> this was very similar to the '3-cycle data break' used on many early DEC
> machines, up through the PDP-8. This kept the buffer address and count in
> main memory (to reduce to cost of devices); the downside is that it increased
> the memory bandwidth usage. (The 4 cycles were 1) current buffer address
> read, ii) buffer extent read, iii) data read/write, iv) modified buffer
> address write-back.)
>
> (Hey, it could have been worse; the DM11 asynchronous line interface of the
> early PDP-11 kept the _shift registers_ in main memory, and used DMA to gain
> access to them during input/output. At least it was efficient DMA - the
> memory address was stored in the device! :-)
>
> The timing issue comes from the fact that, as far as I can tell from
> the IMP hardware manual:
>
> https://walden-family.com/impcode/imp-hardware.pdf
>
> there was _no_ buffering on the modem and host interfaces, just the shift
> register; not so bad on the host interface, which used a handshake, and could
> be paused, but potentially problematic on the synchronous modem interface;
> after a word arrived, it had to be written to memory before the first _bit_
> of the next word arrived. (The DM11 had the same issue.)
>
>
> > That's why I suggested that the I/O capabilities of a microprocessor
> > needed to be considered when trying to figure out how it compared to
> > the 516, more so than just classic metrics like raw memory and CPU speed.
>
> That part I agree with. (But don't forget the address space, either; the
> 4004 really had too small an address space to be usable as a router at _any
> point in time_.)
>
> Noel
More information about the Internet-history
mailing list