[ih] Archiving Internet history

Joe Touch touch at strayalpha.com
Thu Feb 16 10:08:45 PST 2023


FYI, even cemeteries don’t preserve things “forever”. Plots are leased, not sold. Libraries disappear, universities dissolve, and churches are sold and rebuilt.  

I don’t think this group will find a new solution. 

> On Feb 16, 2023, at 10:03 AM, vinton cerf via Internet-history <internet-history at elists.isoc.org> wrote:
> 
> John,
> thanks for your thoughtful intervention. Your conclusion leads me to wonder
> about business models that might produce the desired resilience.
> Preservation by accident is not a plan and so often that's all that we
> achieve.
> 
> v
> 
> 
>> On Thu, Feb 16, 2023 at 1:08 AM John Gilmore <gnu at toad.com> wrote:
>> 
>> vinton cerf via Internet-history <internet-history at elists.isoc.org> wrote:
>>> wow thanks for this lengthy history. So many familiar names. I sure hope
>>> this mailing list does get archived properly as it contains a wealth of
>>> information it would be hard to re-create in the future.
>> 
>> Besides the internet-history mailing list's archives here:
>> 
>>  https://elists.isoc.org/pipermail/internet-history/
>> 
>> I have also been using an Archive-It account to make periodic copies of
>> that web site in the Internet Archive here:
>> 
>> 
>> https://wayback.archive-it.org/15071/20230114211520/https://elists.isoc.org/pipermail/internet-history/
>> 
>> These are accessible via the Wayback Machine as well as via
>> the page for this collection, here:
>> 
>>  Internet and Unix History
>>  https://archive-it.org/collections/15071
>> 
>> As you can see there, it's set up to periodically scan various other URLs;
>> please suggest others that are of historic interest, and I can add them.
>> 
>> FYI, the Wayback Machine does not necessarily get deep copies of every
>> web site.  Their focus is on breadth, so if a website has a thousand web
>> pages, perhaps they will get 50 or 100 of them in each crawl.  Also,
>> there are enough websites which are designed to "trap" a web crawler and
>> cause it to waste a lot of its time, storage and bandwidth uselessly, so
>> the main crawler doesn't keep going.  So, if there's a deep collection
>> (for example, ALL the source code to reproduce a popular Linux
>> distribution) that you think is worth saving for the future,
>> Archive-It.org is one way to get it saved for posterity.
>> 
>> Also FYI, the Internet Archive is an example of the philosophy of
>> putting all your eggs in one basket and watching that basket intently.
>> The (untested) theory is that the collection will be too valuable to let
>> it fall apart later.  A distributed system (like LOCKSS for example)
>> would provide higher likelihood of stuff surviving the next hundred,
>> thousand or 10,000 years.  The Archive is keeping two or three
>> replicated copies of each item they have, and copying them forward onto
>> newer and fatter drives, but all of them are under the same
>> administration and owned by the same nonprofit.  Brewster Kahle is the
>> sparkplug and the main funding source; control of that nonprofit will be
>> in the hands of a small number of probably-less-competent-and-virtuous
>> people after Brewster is no more.  Hell, during the pandemic, ONE GUY
>> was responsible for swapping out failed disk drives before the only
>> second copy of a failed drive happened to also fail.  Bit-rot sets in
>> quickly, and five or ten years of merely incompetent system
>> administration would make a shambles of this finely tuned machine.  Not
>> to mention the possibility of malicious intrusion, particularly by
>> people or governments who want to destroy the historical evidence of
>> whatever bad stuff they've been up to.
>> 
>> It would be better if there were ten Internet Archive nonprofits (or
>> government agencies) scattered around the planet.  Each of them would
>> ideally be taking copies of each others' full holdings, as well as doing
>> their own crawls of the live web, and scanning in whatever physical
>> cultural works they are particularly interested in.  Anybody know any
>> Internet billionaires or spy-agency VP's who want to catalyze and endow
>> a second Internet Archive?  The big advantage for spy agencies is
>> stealth; you can look anywhere you want in your own archive, and nobody
>> knows where you are looking.
>> 
>>        John
>> 
>> 
> -- 
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history




More information about the Internet-history mailing list