[ih] Archiving Internet history

Thu Feb 16 10:03:14 PST 2023

John,
thanks for your thoughtful intervention. Your conclusion leads me to wonder
about business models that might produce the desired resilience.
Preservation by accident is not a plan and so often that's all that we
achieve.

v

On Thu, Feb 16, 2023 at 1:08 AM John Gilmore <gnu at toad.com> wrote:

> vinton cerf via Internet-history <internet-history at elists.isoc.org> wrote:
> > wow thanks for this lengthy history. So many familiar names. I sure hope
> > this mailing list does get archived properly as it contains a wealth of
> > information it would be hard to re-create in the future.
>
> Besides the internet-history mailing list's archives here:
>
>   https://elists.isoc.org/pipermail/internet-history/
>
> I have also been using an Archive-It account to make periodic copies of
> that web site in the Internet Archive here:
>
>
> https://wayback.archive-it.org/15071/20230114211520/https://elists.isoc.org/pipermail/internet-history/
>
> These are accessible via the Wayback Machine as well as via
> the page for this collection, here:
>
>   Internet and Unix History
>   https://archive-it.org/collections/15071
>
> As you can see there, it's set up to periodically scan various other URLs;
> please suggest others that are of historic interest, and I can add them.
>
> FYI, the Wayback Machine does not necessarily get deep copies of every
> web site.  Their focus is on breadth, so if a website has a thousand web
> pages, perhaps they will get 50 or 100 of them in each crawl.  Also,
> there are enough websites which are designed to "trap" a web crawler and
> cause it to waste a lot of its time, storage and bandwidth uselessly, so
> the main crawler doesn't keep going.  So, if there's a deep collection
> (for example, ALL the source code to reproduce a popular Linux
> distribution) that you think is worth saving for the future,
> Archive-It.org is one way to get it saved for posterity.
>
> Also FYI, the Internet Archive is an example of the philosophy of
> putting all your eggs in one basket and watching that basket intently.
> The (untested) theory is that the collection will be too valuable to let
> it fall apart later.  A distributed system (like LOCKSS for example)
> would provide higher likelihood of stuff surviving the next hundred,
> thousand or 10,000 years.  The Archive is keeping two or three
> replicated copies of each item they have, and copying them forward onto
> newer and fatter drives, but all of them are under the same
> administration and owned by the same nonprofit.  Brewster Kahle is the
> sparkplug and the main funding source; control of that nonprofit will be
> in the hands of a small number of probably-less-competent-and-virtuous
> people after Brewster is no more.  Hell, during the pandemic, ONE GUY
> was responsible for swapping out failed disk drives before the only
> second copy of a failed drive happened to also fail.  Bit-rot sets in
> quickly, and five or ten years of merely incompetent system
> administration would make a shambles of this finely tuned machine.  Not
> to mention the possibility of malicious intrusion, particularly by
> people or governments who want to destroy the historical evidence of
> whatever bad stuff they've been up to.
>
> It would be better if there were ten Internet Archive nonprofits (or
> government agencies) scattered around the planet.  Each of them would
> ideally be taking copies of each others' full holdings, as well as doing
> their own crawls of the live web, and scanning in whatever physical
> cultural works they are particularly interested in.  Anybody know any
> Internet billionaires or spy-agency VP's who want to catalyze and endow
> a second Internet Archive?  The big advantage for spy agencies is
> stealth; you can look anywhere you want in your own archive, and nobody
> knows where you are looking.
>
>         John
>
>