[ih] archiving the history of Internet-History

Fri Aug 29 01:30:24 PDT 2025

John Levine via Internet-history <internet-history at elists.isoc.org> wrote:
> FWIW, the list archives are here and the Internet Archive drops by to save a copy every month or so.They go back to 2001.
> https://elists.isoc.org/pipermail/internet-history/

The Internet Archive crawls and saves the list archives bimonthly, only
because I have an Archive-It.org account (a service run by the Archive)
and ask them to.  See:

  https://archive-it.org/collections/15071

The saved web pages go into the Wayback Machine's permanent collection
as well as into my own little collection hosted at the Archive.  The
full text from the saved pages can also be searched in a search box at
the above URL (e.g. "flag day" produces hits on 9 sites, including 406
messages from this list).

If you know of other web sites that should be in such an archive of
Internet and Unix history, please suggest them to me.  I ask for
one-time or annual crawls of historical (unchanging) sites, and annual
or more frequent crawls of ones that get regular updates.

	John

PS: The Internet Archive's regular web crawls focus more on breadth than
depth, so they would otherwise miss crawling down to thousands of
messages from this list.  Also, a lot of web sites contain unintentional
(or sometimes intentional) "crawler traps" that require manual
configuration to avoid the crawler going down a rathole of
www.foo.bar/symlink/symlink/symlink/symlink/foo.html forever.
Archive-It provides ways to manually avoid such traps once your crawl
has run down them once.

PPS:
> Related museums have been contacted and are not available to support us.

I bet I know somebody smart enough at the Computer History Museum to at
least subscribe a logfile to the existing list, which would accumulate
new messages on a CHM server.  And a simple wget command, run once,
would pull in the already-archived logfiles saved by Date to get all the
past postings to the list.  At current storage prices, this would cost
them a small rounding error.  Yes, it would break someday for a random
reason, but at least it would get all the discussions up to that point,
archived in a third place where neither an ISOC nor an IA failure could
touch them.