[ih] Intel 4004 vs the IMP

Mon Nov 15 21:43:20 PST 2021

Yep, you're right that the IMP had I/O by DMA.   But it wasn't just used 
for performance needs (those IMP guys were very clever...)

It's well documented that the IMP had the ability to reload its program 
from a neighboring IMP, and even to deploy new releases of an IMP by 
successive reloads until the whole network was running the new release, 
and to do that without disrupting traffic flows.   Kind of like open 
heart surgery...without losing the patient.

As part of that patent dispute, I had to figure out *how* it did those 
things and describe it at an instruction-by-instruction level.  That's 
all in a report I submitted but I don't think it's publicly available 
(talk to the judge, not me).    The ability to do those things relied on 
the behavior of the DMA, interrupt mechanisms, CPU, and some custom 
add-on hardware developed by BBN and/or Honeywell.

The IMP custom hardware included timer functions, and in particular a 
"watchdog timer" which the program could start.   If the timer ever ran 
down, the processor would get an unblockable interrupt and be forced to 
execute in the interrupt handler.   That capability was used primarily 
to recover from software failures.   If the program didn't reset the 
watchdog soon enough (it was in timeframes of seconds, not 
milliseconds), the interrupt would fire and force the PC to execuite the 
interrupt handling code.   So the IMP program was designed to reset the 
watchdog frequently as part of its "main loop".    If everything went 
well, the watchdog timer would never run out.

If the watchdog interrupt handler ran, it would assume that something 
was seriously wrong, and initiate a reload of the entire IMP program 
from some other IMP.   There were other ways to cause a reload from a 
neighbor.   The network operator could command the IMP to reload.  
Consistency checks performed periodically in the software, e.g., 
detecting that something that "should never happen" had in fact 
happened, would also be handled by jumping to the reload code.  There 
were quite a few situations which would trigger a reload; I probably 
only found some of them.

Once the IMP was executing the reload code, it would block all 
interrupts.  There might still be I/O in progress, but when it completed 
nothing would happen.   Instead, the IMP would pick one of its neighbors 
by random choice, prepare a "please reload me" packet, and issue an I/O 
write on the appropriate interface to have the DMA send that packet out 
the wire to the chosen neighbor.  It would then execute an I/O read 
instruction on that same interface, specifying that whatever came in on 
that interface next should be placed in its memory, overwriting the 
whole area in which the running IMP code was kept.   Having done that, 
it would drop to an instruction that just did a single-instruction loop 
(i.e., a "JUMP <here>").

The IMP would hang forever executing that instruction -- but the DMA and 
watchdog are still active.   And the neighbor IMP is hopefully receiving 
the "reload me" request and then issuing an I/O write to send its entire 
code space of its memory out the wire to the other IMP which has an I/O 
read active to receive it and overwite its own whole code memory.

Effectively the machine is now locked in a tight loop, with all 
interrupts disabled except for the one associated with that read 
operation ... and the watchdog timer.   And the I/O is still happening.

When the I/O interrupt finally fired, the handler would examine the 
pointers from the I/O to see if it finished cleanly, and if so JUMP to 
the starting address of the IMP code, pretty much as if it had just been 
powered on.   If something looked amiss, e.g., the I/O transfer was too 
short to have delivered the whole IMP code space, the handler would pick 
another random neighbor IMP and go back to the reload code to get a 
memory image from that one instead.   If the watchdog interrupt happened 
before the I/O completed, something was also wrong, and it would 
similarly try a different IMP to see if it could get a good reload.   
Today, it would probably also post a scathing Yelp review about that 
other IMP that failed to deliver.

Of course, the code coming from a neighboring IMP might be a newer or 
older software release.   Things in memory tend to move around when 
software gets revised, and the running IMP code was being overwritten by 
the DMA -- as it was running.   So at some point, the instruction that 
the IMP was executing would be overwritten by whatever instruction lived 
at that memory address in the IMP code release coming from the 
neighbor.    It seems that this was handled by programmer discipline, 
i.e., making sure that the "JUMP <here>" instruction was present at the 
same memory address in every core image that might be in use in the 
ARPANET at the time. I surmised this when I found a small fragment of 
code that it appeared could never be executed.   You couldn't get there 
from anywhere else in the code.   But, in a *different release of the 
code* that's likely where the "JUMP <here>" instruction was stored.   So 
when the DMA overwrote the instruction the IMP was currently executing, 
the instruction didn't change at all, regardless of which releases were 
in the sending and receiving IMPs.

When I figured this all out, I concluded that the IMP designers were 
indeed very clever.   But as far as I know, what I wrote above has never 
been documented elsewhere.  Not even as comments in the code.

So, when I suggested that it was important to look at more than CPU 
speed, memory capacity, et al, this is the kind of functionality I was 
thinking about.  Steve's question about a microprocessor being able to 
do what an IMP did means being able to somehow implement similar I/O, 
timer, interrupt, DMA, and any other "clever" behavior of the H516-based 
IMP, using some combination of add-on hardware (I/O, timer etc.) and an 
instruction set and processor behavior that supported whatever "clever 
programming" was needed to get functionality such as the reload and/or 
software upgrade capability, and perhaps other such stuff I didn't 
encounter since I only looked at one aspect of the IMP code.

Could the 4004 do it?   I have no idea, someone who knows a lot more 
about ancient microprocessors would have to figure out how they might do 
what the IMP did.

Enjoy,
/Jack Haverty

On 11/15/21 6:58 PM, Noel Chiappa via Internet-history wrote:
>      > E.g. a register-register MOV took 1.7 microseconds, but a memory-memory
>      > move .. seems to have taken 1.7+4.9=6.6 usec, so about 1/5th of a MIP
>
> Ooops, I mistakenly was in the -11/23 timing appendix, not the -11/03; the
> latter was considerably slower: 3.5 usec basic, 2.5(sic)+9.1=11.6 usec.
> (Indirect was even slower.) So about 1/10th of a MIP.
>
>
>      > From: Jack Haverty
>
>      > E.g., to handle I/O on a serial interface, the CPU might have to take
>      > an interrupt on every byte
>      > ...
>      > I think that kind of issue is the one Alex referred to about selecting
>      > the 316 because of its interrupt mechanism. ... how the hardware
>      > handled I/O on all those interfaces was a crucial factor in selecting
>      > the 516.
>
> Yes and no. The IMP's modem and host interfaces were both DMA (in the sense
> that the CPU only got a single interrupt - which diverted instruction
> processing in the CPU to other instructions - for every packet, not on every
> word; the details differed significantly from modern DMA, though - see
> below). But there was a timing issue.
>
> Per 'The interface message processor for the ARPA computer network',
> available here:
>
>    https://www.walden-family.com/public/1970-imp-afips.pdf
>
> "To send a packet, the IMP program sets up memory pointers to the packet and
> then activates the interface ... The interface takes successive words from
> the memory using its assigned output data channel and transmits them
> bit-serially (to the Host or to the modem). When the memory buffer has thus
> been emptied, the interface notifies the program via an interrupt".
>
> The details are intetesting: the IMP used "a set of 16 multiplexed channels
> (which implement a 4-cycle data break)". This is through a device called the
> DMC, the 'Direct Multiplex Control' (sometimes 'Data Multiplex Control').
> Notice the "4-cycle data break"; consulting the DMC manual (also online),
> this was very similar to the '3-cycle data break' used on many early DEC
> machines, up through the PDP-8. This kept the buffer address and count in
> main memory (to reduce to cost of devices); the downside is that it increased
> the memory bandwidth usage. (The 4 cycles were 1) current buffer address
> read, ii) buffer extent read, iii) data read/write, iv) modified buffer
> address write-back.)
>
> (Hey, it could have been worse; the DM11 asynchronous line interface of the
> early PDP-11 kept the _shift registers_ in main memory, and used DMA to gain
> access to them during input/output. At least it was efficient DMA - the
> memory address was stored in the device! :-)
>
> The timing issue comes from the fact that, as far as I can tell from
> the IMP hardware manual:
>
>    https://walden-family.com/impcode/imp-hardware.pdf
>
> there was _no_ buffering on the modem and host interfaces, just the shift
> register; not so bad on the host interface, which used a handshake, and could
> be paused, but potentially problematic on the synchronous modem interface;
> after a word arrived, it had to be written to memory before the first _bit_
> of the next word arrived. (The DM11 had the same issue.)
>
>
>      > That's why I suggested that the I/O capabilities of a microprocessor
>      > needed to be considered when trying to figure out how it compared to
>      > the 516, more so than just classic metrics like raw memory and CPU speed.
>
> That part I agree with. (But don't forget the address space, either; the
> 4004 really had too small an address space to be usable as a router at _any
> point in time_.)
>
> 	Noel