[ih] Design choices in SMTP
Jack Haverty
jack at 3kitty.org
Thu Feb 9 14:18:48 PST 2023
RJE was another priority of Lick at MIT. So I implemented an RJE for
our PDP-10 to enable users to submit their "card decks" and get their
"printout" after their job was run. Checking to see if the job had
been run was part of the RJE protocol, but there was no way to have the
CCN system proactively tell you your output was ready. So I had to build
a "daemon" for the PDP-10 (ITS OS) to interact with the CCN over the
ARPANET, submitting the card deck, polling periodically to see if the
job was done (there was no way to ask about the job queue), and then
retrieve the output (whatever would have come out on the CCN printer).
AFAIK, there was very little if any use of this system at MIT. Most
people had had their fill of punch cards and waiting for listings. But
the testing I did indicated an annoyingly high failure rate. Not as bad
as 50%, but far less than 99%.
So, debugging time.....
Since the polling nature of the interaction meant that the human user
might not be online, a daemon process had to run on the system to do the
interactions with CCN whenever CCN was ready. I had already written a
mail server daemon, so it was an obvious design choice to simply add RJE
functionality to the mail server. To "submit a card deck", you would
prepare the card deck in your favorite editor, then use it as the text
of a "message". But instead of "sending" the message, you would "submit
to CCN". The mail server would then fire up an RJE connection rather
than an SMTP one, and carry out all of the needed interactions with the
CCN system. When the job output was ready, the daemon would retrieve
the listing from CCN, and inform the originating user that the output
was available. That output looked like just another message, which
could be archived, forwarded, sent to the printer, sent to the
Datacomputer for posterity, or whatever else you could do with
"email". Essentially from a user's view you emailed your card deck to
CCN, and somewhat later got a reply containing the printer output.
The use of a daemon meant that all of the interactions over the ARPANET
were computer-computer rather than human-computer. This followed
Lick's vision of having the computer and network help people do stuff.
We assumed people would have their local computers connected to the
network, where traditionally people would have their terminal connected
to the network, logged in to some remote computer over a Telnet-style
connection.
I found the source of the unreliability.
Sadly, although our design was based on computer-computer interactions,
the RJE protocol mechanisms were based on human-computer interactions.
RJE assumed that there was a human at the other end of the connection.
The commands and responses were pretty well defined, so my code could
generally imitate a human pretty effectively and handle submitting a
card deck and retrieving the results.
But since the system assumed there was a human in the picture....
Looking at the logs, I noticed that the protocol interactions, all those
nicely formatted commands and responses, would occasionally be
interrupted by "noise" on the connection. You would occasionally see
something like:
MESSAGE FROM SYSOP!!! Remember CCN will be down this Saturday and
Sunday for installation of additional disk drives!
Such "noise" would really screw up the state machine dealing with the
RJE exchanges. It could happen at any time, even in the middle of an
important protocol word, and would simply appear in the ascii stream
that was carrying all of the RJE protocol commands.
This experience and other similar encounters with trying to use a
human-computer connection for computer-computer interactions (email
headers were a big one, as well as the idiosyncracies of the FTP "MAIL"
command) motivated me to write RFC 722 --
https://www.rfc-editor.org/rfc/rfc722 to try and capture some design
principles for computer-computer interactions that were crucial to
Lick's vision.
Fun times, if sometimes a bit frustrating.
Jack
On 2/9/23 11:22, Dave Crocker via Internet-history wrote:
>
>> NETRJE didn’t get a lot of use because the systems that could support
>> a Server FTP didn’t need the NETRJE. (Can someone correct me about
>> that?)
>>
>> What did get a lot of use of was CCNRJE written by Bob Braden for the
>> UCLA CCN 360/91.
>
>
> I used the RJE program, available at ISI, to submit jobs to the UCLA
> CCN 360/91. But I would have sworn the author was someone other than
> Braden.
>
>
> The nature of the use was not as straightforward as Steve's example.
>
> My job was user support and documentation at the UCLA Arpanet
> project. A year or two before getting hired, I'd become a fan of the
> NLS system at SRI, even including the text-only interface. (Prior to
> dropping out and getting hired for this job, I wrote a text formatter,
> with inline commands, that emulated the hierarchical text model in
> NLS. I developed and ran it at my place of work which was the other
> 360/91 on campus, the NIH-funded Health Sciences Computing. There
> were only 18 of those machines built. This was also my only major
> foray into using PL/1.)
>
> Anyhow, I taught the department secretaries -- remember when that was
> what they were called? -- to use the remote system. After editing, we
> needed to be able to print documents. The one's they'd been editing
> and the ones from the NIC, of course.
>
> So I had them FTP the document down to ISI and run the RJE program to
> send the document to the high-speed, upper and lower case printer at
> CCN. The fastest U/L printer we had in the department was 120 cps, so
> this was /much/ better.
>
> This also wound up generating a serious bit of learning about computer
> science and statistics. (Before and after dropping out, I studied
> Psychology and had taken 1 really basic stats course and no CS courses.)
>
> The secretaries quickly got facile with the process, but they started
> complaining that the sequence would often fail. I turned to my office
> mate, Jon Postel, and asked whether he had any suggestions. He had me
> explain the total sequence being used and asked how often things were
> failing and what the symptoms were.
>
> I noted that there were widely different systems, but that the overall
> failure rate seemed to be about 50%,
>
> He asked how reliable our department Sigma 7 was, I suggested a good,
> but not outstanding number and he agreed. Then he asked about the net
> itself, and we agreed it was highly reliable, maybe 90%. Then SRI,
> which wasn't great, and ISI, which had gotten quite good, then CCN,
> which was not great.
>
> Cumulative probably came out almost exactly at 50%
>
> I later hear that the failure to perform a similar, aggregate failure
> rate exercise was the reason the Russians beat us to space...
>
> d/
>
More information about the Internet-history
mailing list