[ih] Saving IETF history

Sun May 9 01:23:32 PDT 2021

Dave Crocker wrote:
> Saving the RFCs is obvious.  What appears to be less obvious and, IMO,
> is just as important in historical terms, is /all/ of the IETF-related
> work materials.  Drafts.  Mailing list archives.  Session notes.
> Everything.

John Day wrote:
> Agreed. It should go somewhere. Same for all of the other standards
> groups, forums, consortia, etc.

Re the IETF, look in:

  https://archive-it.org/collections/11034

A few years ago, I set up an Archive-It.org job to monitor the IETF's
web presence.  I was disturbed at the deliberate ephemerality of the
Internet-Draft ecosystem.  I had been looking back at a 10-year-old
effort to eliminate some ridiculous restrictions on the IPv4 address
space, and IETF had thrown away most of the relevant documents (though I
found copies elsewhere once I knew their names).

Archive-It is a service of the nonprofit Internet Archive (archive.org).
So, the Internet Archive's robots are now crawling (various parts of)
the IETF websites every week, month, and quarter, under my direction.
And saving the results forever, or as long as the Internet Archive and
the Wayback Machine exist.  Between 1998 and now it's pulled in about
1.8 TB of documents, which are accessible and searchable either from the
above URL, or from the main Wayback Machine at web.archive.org.

The IETF websites aren't organized for archiving.  I frankly don't
understand their structure, so am probably missing some important
things, and overcollecting other things.  But at least I tried.
Suggestions are welcome.

Just be glad the IETF is copying-friendly.  Imagine trying to archive
the IEEE or OSI standards development process.  Then imagine big
copyright lawsuits from self-serving people who tied their income
stream to restricting who can access the standards and the
standardization process.

	John

PS: Anyone or any institution can get an Archive-It account for roughly
$10K/year.  The service automates the collecting of *anything* you want
from the web for posterity.  (If you want them to, the Internet Archive
will also write copies of it on new hard drives and send them to you for
your own archival collection.)  About 800 institutions are customers today.
You can also get a low-support low-volume Archive-It Basic account for
$500/year.  Or get custom Digital Preservation services to improve the
likelihood that your own curated digital assets will survive into the
distant future.  See https://Archive-It.org .

PPS: The Internet Archive's long term survival is, of course, not
guaranteed.  In particular, it will go through a tough transition when
its founder eventually dies.  What is guaranteed is that they have built
a corpus of useful information: Millions of books, billions of web
pages, hundreds of thousands of concerts, decades of saved television
channels, etc.  They are absorbing a lot of archival microfilm, too,
including genealogical and census records, magazines, etc.  This corpus
will likely motivate people to preserve and replicate it into being
useful in the distant future.  They have tried to design the technical
storage to encourage that result.  Does anyone here know anybody who has
both the money and the motivation to make a complete and ongoing copy in
a separately administered, separately owned organization?  That would
significantly mitigate the long term risk of having all the replicated
copies of the corpus owned by a single US nonprofit.  It would probably
take a bare minimum staff of 10 people to run and manage such an
operation, with dozens of petabytes of rotating storage in multiple data
centers and a large collection of (mostly free) software keeping it all
organized and accessible.