[ih] Archiving Internet history

Wed Feb 15 22:08:35 PST 2023

vinton cerf via Internet-history <internet-history at elists.isoc.org> wrote:
> wow thanks for this lengthy history. So many familiar names. I sure hope
> this mailing list does get archived properly as it contains a wealth of
> information it would be hard to re-create in the future.

Besides the internet-history mailing list's archives here:

  https://elists.isoc.org/pipermail/internet-history/

I have also been using an Archive-It account to make periodic copies of
that web site in the Internet Archive here:

  https://wayback.archive-it.org/15071/20230114211520/https://elists.isoc.org/pipermail/internet-history/

These are accessible via the Wayback Machine as well as via
the page for this collection, here:

  Internet and Unix History
  https://archive-it.org/collections/15071

As you can see there, it's set up to periodically scan various other URLs;
please suggest others that are of historic interest, and I can add them.

FYI, the Wayback Machine does not necessarily get deep copies of every
web site.  Their focus is on breadth, so if a website has a thousand web
pages, perhaps they will get 50 or 100 of them in each crawl.  Also,
there are enough websites which are designed to "trap" a web crawler and
cause it to waste a lot of its time, storage and bandwidth uselessly, so
the main crawler doesn't keep going.  So, if there's a deep collection
(for example, ALL the source code to reproduce a popular Linux
distribution) that you think is worth saving for the future,
Archive-It.org is one way to get it saved for posterity.

Also FYI, the Internet Archive is an example of the philosophy of
putting all your eggs in one basket and watching that basket intently.
The (untested) theory is that the collection will be too valuable to let
it fall apart later.  A distributed system (like LOCKSS for example)
would provide higher likelihood of stuff surviving the next hundred,
thousand or 10,000 years.  The Archive is keeping two or three
replicated copies of each item they have, and copying them forward onto
newer and fatter drives, but all of them are under the same
administration and owned by the same nonprofit.  Brewster Kahle is the
sparkplug and the main funding source; control of that nonprofit will be
in the hands of a small number of probably-less-competent-and-virtuous
people after Brewster is no more.  Hell, during the pandemic, ONE GUY
was responsible for swapping out failed disk drives before the only
second copy of a failed drive happened to also fail.  Bit-rot sets in
quickly, and five or ten years of merely incompetent system
administration would make a shambles of this finely tuned machine.  Not
to mention the possibility of malicious intrusion, particularly by
people or governments who want to destroy the historical evidence of
whatever bad stuff they've been up to.

It would be better if there were ten Internet Archive nonprofits (or
government agencies) scattered around the planet.  Each of them would
ideally be taking copies of each others' full holdings, as well as doing
their own crawls of the live web, and scanning in whatever physical
cultural works they are particularly interested in.  Anybody know any
Internet billionaires or spy-agency VP's who want to catalyze and endow
a second Internet Archive?  The big advantage for spy agencies is
stealth; you can look anywhere you want in your own archive, and nobody
knows where you are looking.

	John