[ih] Distributed file systems [was: As Flag Day approaches at CMU]
Jack Haverty
jack at 3kitty.org
Mon Sep 8 16:55:48 PDT 2025
Thanks for the history. I think a part of history often ignored is the
movement of people, and their skills, interests, and ideas, from place
to place. It can explain some of the "why" things happened, or not.
Totally agree that a lot of the MIT work was underplayed and
underreported. I think that much of that was due to the work being
viewed as some pragmatic "hack" that somehow helped advance the "real"
work of the research mission. IMHO, that kind of attitude has evolved
into the "open source" community today, where lots of people work
together as a team to build some kind of tool that they need for their
own use. There were a lot of "distributed file system" research
projects over the years, but I didn't find the one that I personally use
now every day until I stumbled on Syncthing (see syncthing.net).
With 50 years of musing, I've come up with another hypothesis about the
early days of networking and ARPA research: ARPA had not figured out
what was required to establish network-wide mechanisms, i.e., rules for
funding research activities and establishing mission goals for the
scattered research groups scattered across the ARPANET.
I experienced this personally. At MIT in the early 70s, Licklider was
especially interested in using networked computers to assist human
activities. So at some point our (his) group became the "Office
Automation" group - one of many names we had over the 8 years I was
there. We were chartered, and I assume funded by ARPA, to figure out
how to use the ARPANET and our computers to assist humans in all types
of office work.
That led to a focus on "electronic mail", but in Lick's vision that
covered a wide range of activities. In the business environment, there
was interoffice and intraoffice correspondence of many types. Short
notes, formal invoices, workflows involving approval stages, bills of
lading, and many other types of such correspondence travelled both
within a company and between companies. External services also could be
involved, such as escrow agents, certified or registered transmission
(as in postal service), legal reviews of contracts, and many more.
Office interaction could even include interactions such as telephone
conversations, collaborative creation of formal documents, and what we
now know as email. Even paper.
The idea was that all of the correspondence that travelled within or
between companies could be handled by those organizations' computers
interacting over the ARPANET. (LANs didn't exist yet) The military
environment had similar needs, for all of the correspondence that it
required.
So we dove into that problem space, and built some of the pieces. One of
the more esoteric pieces was the capability to direct correspondence to
the Datacomputer, which could serve not only as an archive, but also as
a type of escrow agent, able to independently verify the existence of a
document. It might even also deliver it to the addressees and then
verify that it was delivered at a certain date and time.
All that kind of activity would require the computers at the ends of
ARPANET connections to interact. So we began to define protocols and
formats to carry out that interaction between computers. But, as the
saying goes - It Takes Two To Tango - and we only had our MIT-DM
PDP-10. So other ARPANET sites had to be working on the same research
goal.
Several RFCs were planned, and the first two written and released.
RFC713 described a more computer-friendly syntax for carrying data
structures across the ARPANET. RFC722 described a kind of "philosophy"
of computer-computer interactions and rules to be followed. RFC723
contained a "request response protocol" using the RFC713 data syntax,
modelled after the DO/DONT/WILL/WONT scheme already in use on the ARPANET.
There was a lot of debate and discussion about email in those days, but
it is not captured in RFCs since it was primarily done using our then
new toy of electronic mail. IIRC, many people saw the need for some
more rigorous scheme for computer-computer interaction over the
ARPANET. Someday. But it wasn't part of their own research focus, and
they weren't funded to spend a lot of time implementing the admittedly
complicated schemes proposed.
So someone (ARPA? "Rough consensus"? ??) decided that a simple mechanism
should be done first, i.e., one that every site could implement without
distracting from their real research work. The more elaborate mechanism
was also needed, but it would be "on the back burner" while the simple
solution was deployed.
So, at MIT we stopped working on the protocols, formats, and
architecture for office automation, since there wouldn't be anyone else
implementing it. That's why RFC723 was never published. Work would
resume when the simple email mechanism was finished. I never expected
it would take more than 50 years.
In retrospect, the ARPANET itself was successful (IMHO) because every
site was somehow funded and motivated to get their computer "on the
net". IMPs arrived, hardware was designed and built, NCPs, Telnets, and
FTPs were coded, and even simple email, because they were all necessary
to be considered "on the net".
Other research areas, perhaps "distributed file systems", "operating
systems", and such, had lots of projects, but perhaps never broadly
enough throughout the ARPANET community to achieve that necessary
critical mass.
Not much happened for quite a while. The ARPANET grew, but
Telnet/FTP/Email were the primary use.
TCP managed to achieve critical mass. I think that had a lot to do with
Vint/Bob's decision to fund implementations in all of the common types
of computers then in use on the 'net, to make implementations "open" and
free, and to work with other parts of government to get the Internet
technology into "real world" use. That enabled TCP to reach critical mass.
I don't remember any other "critical mass" attainment until the Web in
the early 1990s. Curiously, the Web also involved Lick's group at MIT.
But that's another story, and ARPA didn't seem to be much involved.
Metcalfe's law states that the value of a network is exponentially
related to the number of users on it. I think a similar rule might
apply to research projects on networks - their success is dependent on
the number of network sites with the same research goals and required
funding to work on and implement the resultant technology.
Jack Haverty
On 9/6/25 17:05, Steve Crocker wrote:
> Jack,
>
> I spent a year and half in the MIT-AI lab (Feb 1967 to May 1968), the
> next three years (May 1968 to May 1971)at UCLA working on the Arpanet
> protocols, the next three years (July 1971 to August 1974) at (D)ARPA
> primarily overseeing the AI portfolio but also paying attention
> occasionally to Arpanet issues. As part of this work I saw the
> multiple operating system and architecture projects at the various
> sites, including in particular the C.mmp and other efforts at CMU.
>
> Later, around 1990 or so, Jerry Popek at UCLA and I at
> Trusted Information Systems worked together on extending his shared
> Unix file structure to work over the Internet using secured file
> transfer. That project was called Truffles. I share
> these biographical details as a preface to my remarks embedded below.
>
> On Sat, Sep 6, 2025 at 7:24 PM Jack Haverty via Internet-history
> <internet-history at elists.isoc.org> wrote:
>
> I doubt anything was published. It just wasn't that big a deal, and
> wasn't part of the research mission.
>
>
> Heh, heh. This is an example of some really great work done at MIT-AI
> lab that was IMO underplayed. Documentation and publication of the
> MIT-AI work was almost non-existent. From the perspective of sitting
> at my desk at DARPA, it was pretty challenging figuring out what to
> write each year. Further, the small hack you're describing could have
> been the basis for a more common network service. However, it would
> have required a fair amount of work to deal with the differences
> across the operating systems and, of course, the access control
> problems. The ITS designers, at least in the early days, took pride
> in avoiding heavy duty access control. It was a delight to use ITS,
> and I fully admired the MIT-AI hackers. But Tenex had stronger
> controls and quite usable software, so it won the day. (DEC had its
> own operating system, TOPS-10, which was much weaker. They eventually
> licensed Tenex from BBN and brought it out as TOPS-20.)
>
>
> The "distributed file system" I described was "a hack" probably coded
> overnight when someone realized that it could be easily done by using
> the mechanisms that already existed inside ITS - like the JOB/BOJ
> pseudo-devices. Such work wasn't part of any official project,
> but was
> done primarily to mitigate the scarcity of CPU cycles, memory, or
> disk
> space. It might have been mentioned in the annual report of the
> AI Lab
> work, but I doubt there was ever a paper published.
>
>
> Yeah. See above. It would have been an excellent and substantial
> contribution to the Internet if the work had been pursued. It took a
> very long time before there were semi-sensible solutions to the
> file-sharing problem. The early commercial offerings were
> ridiculously expensive. Two of us founded Shinkuro to provide an
> essentially free way to share files across the net based on everyone
> having a local copy of each shared file, i.e. no central servers.
> Dropbox, et al eventually won the day.
>
>
> With several ITSes on the ARPANET, and the hassle of dealing with
> FTP to
> move files, someone likely noticed that a "remote disk" capability
> would
> be useful and was an easy thing to try. I recall someone once
> wondering why there was a lot of traffic between hosts on the MIT
> IMP,
> where other sites traffic patterns were mostly long distance. MIT
> used
> the IMP as a poor man's LAN, and made it as easy as possible by
> coding
> up things like remote disks.
>
>
> MIT wasn't the only place where the IMP became the de facto local area
> net. And it wasn't long before there was a broad realization that the
> majority of the communication in the Internet was local.
>
> My favorite example was at UCSB where they had been trying to add
> interprocess communication to OS/MVT on their IBM 360/75 to make it
> possible for two partitions, i.e. separate jobs running concurrently,
> to communicate. I don't believe they ever got it working. However,
> when they connected their machine to the IMP and got their host-host
> protocol software working...
>
>
> If anyone else had "done it first" I never heard about it. There
> was no
> network yet, so information about other projects was not readily
> available. Professors might have read journals, but hackers mostly
> wrote code.
>
> Most of the OS changes to ITS were done by the MIT AI Lab. But their
> focus was AI, and changes to the OS were often done to help with
> some AI
> project. In DM, we changed software as needed, mostly focussed on
> research on use of the network such as email.
>
> AI changed their PDP-10 hardware when that was useful, e.g., by
> adding a
> new instruction to ROTate memory in a counterclockwise direction,
> which
> was helpful to the AI Chess program. I recall someone at some point
> made some hardware changes (might have been on the DM machine) that
> enabled a program to be run "in reverse" for a bit. That was
> helpful
> in debugging to figure out how the OS code actually got to some weird
> place, e.g., in some data structure, where it crashed because data as
> "instructions" made little sense.
>
> ITS was a lot like Unix, in the sense that it was not an official
> project to research issues of operating systems. That was more
> Multics
> territory. ITS was just a tool to be used and modified as needed to
> help with the actual research topics of AI, DM, and later ML
> (MathLab)
> and MC (Macsyma Consortium).
>
>
> The MIT-AI lab, along with the AI labs at CMU, Stanford, BBN, et al
> produced some great systems work.
>
>
> The DEC field service techs used to hate coming to ITS land, but also
> liked it because they always learned something.
>
> Jack
>
>
> On 9/6/25 14:27, Brian E Carpenter wrote:
> > I've never looked into the early history of distributed file
> systems.
> > Was that work at MIT ever published? Was it pioneering or did
> someone
> > else do it first?
> >
> > My favourite paper in that area is the "Unix United" paper [1]
> from 1982.
> >
> > [1] https://doi.org/10.1002/spe.4380121206 (paywalled) or
> > http://homepages.cs.ncl.ac.uk/brian.randell/Papers-Articles/399.pdf
> > Regards/Ngā mihi
> > Brian Carpenter
> >
> > On 07-Sep-25 08:04, Guy Almes via Internet-history wrote:
> >> Jack,
> >> Thanks very much.
> >> So this was in place by the mid-70s, right?
> >> -- Guy
> >>
> >> On 9/6/25 3:15 PM, Jack Haverty via Internet-history wrote:
> >>> ITS at MIT circa early 1970s used a naming convention for files --
> >>> <device>:<directory>;<name1> <name2> So, for example, I logged
> in to
> >>> MIT-DM as JFH. My files on disk were things like
> DSK:JFH;THESIS TJ6
> >>> File names were limited to alphanumerics of 6 characters or less
> >>> (motivated by what you could encode into a 36-bit PDP-10 memory
> >>> location).
> >>>
> >>> Once the ARPANET and NCPs appeared, the 'net was a new toy, so
> >>> people of
> >>> course experimented with how to use it. I don't remember the
> >>> details or
> >>> timing (sometime in early 1970s), but at but at some point the
> Message
> >>> Of The Day announced a new capability - you could use files on
> some
> >>> other ITS machine just by using a different <device> to
> specify the DSK
> >>> on some other ITS machine.
> >>>
> >>> So, for example, from the MIT-AI machine a user could get to
> my file on
> >>> the DM machine by specifying DM:JFH;THESIS TJ6.
> >>>
> >>> Similarly, from my account on MIT-DM, I could get to another
> machine's
> >>> files by using a name such as AI:TK;NEWS ITS to get at Tom
> Knight's
> >>> file
> >>> on the AI machine.
> >>>
> >>> This provided more flexibility than FTP. You could use a remote
> >>> file in
> >>> any program that knew how to use files on devices. To the
> program, the
> >>> remote disk looked and behaved like a local disk. (More or less -
> >>> problems of "global LANs" were still be be surfaced)
> >>>
> >>> I don't recall at all how this worked, or who implemented it.
> IIRC, it
> >>> took advantage of an interprocess communication capability
> called the
> >>> "JOB/BOJ device", which enabled one program to open a JOB
> device, and
> >>> another program to open the corresponding BOJ (JOB reversed)
> device,
> >>> and
> >>> send whatever they liked back and forth. But I don't remember
> details.
> >>>
> >>> We also had the ability for one process (aka "job") to map
> some or all
> >>> of another process' address space into its own address space.
> I can't
> >>> recall if anyone got motivated to get that working across the
> ARPANET
> >>> though. If so, it would probably have been done using the same
> >>> internal mechanisms that got the remote file systems capability.
> >>>
> >>> However, for anyone curious, the ancient ITS system is online
> and has
> >>> even been resurrected so you can look at the code or even run
> it on
> >>> your
> >>> modern computer - see https://github.com/PDP-10/its
> >>>
> >>> Jack Haverty (JFH at MIT-DM in the 70s)
> >>>
> >>> On 9/6/25 09:28, Guy Almes via Internet-history wrote:
> >>>> Noel,
> >>>> So this was a real networked file system (and not just
> lots of
> >>>> FTP)?
> >>>> Very interesting,
> >>>> -- Guy
> >>>>
> >>>> On 9/6/25 11:35 AM, Noel Chiappa via Internet-history wrote:
> >>>>>
> >>>>> > From: Guy Almes
> >>>>>
> >>>>> > There are probably a number of ARPAnet sites where the
> >>>>> ARPAnet
> >>>>> > served this LAN role in the pre-Ethernet days.
> >>>>>
> >>>>> Notably MIT, where the 4 ITS machines shared their file systems
> >>>>> over the
> >>>>> ARPANET.
> >>>>>
> >>>>> Noel
> >>>>> --
> >>>>> Internet-history mailing list
> >>>>> Internet-history at elists.isoc.org
> >>>>>
> https://urldefense.com/v3/__https://elists.isoc.org/mailman/listinfo/
> >>>>>
> <https://urldefense.com/v3/__https://elists.isoc.org/mailman/listinfo/>
> >>>>> internet-history__;!!KwNVnqRv!
> >>>>>
> C8xpr0pcWUCRUGu5ny4SaIVDwdiMWrntxKhWopeJnt_Ni81FeTKeUj6hb30-W92d1QCtfI-
>
> >>>>>
> >>>>> NdmswzrIufFLHCtXfTXzJRA$
> >>>>>
> <https://urldefense.com/v3/__https://elists.isoc.org/mailman/listinfo/
>
> >>>>>
> >>> internet-history__;!!KwNVnqRv!
> >>>
> C8xpr0pcWUCRUGu5ny4SaIVDwdiMWrntxKhWopeJnt_Ni81FeTKeUj6hb30-W92d1QCtfI-
> >>> NdmswzrIufFLHCtXfTXzJRA$> >>
> >>>
> <https://urldefense.com/v3/__https://elists.isoc.org/mailman/listinfo/internet-history__;!!KwNVnqRv!C8xpr0pcWUCRUGu5ny4SaIVDwdiMWrntxKhWopeJnt_Ni81FeTKeUj6hb30-W92d1QCtfI-NdmswzrIufFLHCtXfTXzJRA$>
> >>>>>> -
> >>>>>
> Unsubscribe:https://urldefense.com/v3/__https://app.smartsheet.com/b/
> >>>>> <https://urldefense.com/v3/__https://app.smartsheet.com/b/>
> >>>>> form/9b6ef0621638436ab0a9b23cb0668b0b?
> >>>>>
> The*20list*20to*20be*20unsubscribed*20from=Internet-history__;JSUlJSU!!
>
> >>>>>
> >>>>>
> KwNVnqRv!C8xpr0pcWUCRUGu5ny4SaIVDwdiMWrntxKhWopeJnt_Ni81FeTKeUj6hb30-
> >>>>> W92d1QCtfI-NdmswzrIufFLHCtVmthkWew$
> >>>>>
> <https://urldefense.com/v3/__https://app.smartsheet.com/b/form/9b6ef0621638436ab0a9b23cb0668b0b?The*20list*20to*20be*20unsubscribed*20from=Internet-history__;JSUlJSU!!KwNVnqRv!C8xpr0pcWUCRUGu5ny4SaIVDwdiMWrntxKhWopeJnt_Ni81FeTKeUj6hb30-W92d1QCtfI-NdmswzrIufFLHCtVmthkWew$>
>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
>
> --
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history
> -
> Unsubscribe:
> https://app.smartsheet.com/b/form/9b6ef0621638436ab0a9b23cb0668b0b?The%20list%20to%20be%20unsubscribed%20from=Internet-history
>
>
>
> --
> Sent by a Verified
>
> sender
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://elists.isoc.org/pipermail/internet-history/attachments/20250908/c4c59212/attachment-0001.asc>
More information about the Internet-history
mailing list