[ih] Archiving Internet history

Thu Feb 16 10:37:28 PST 2023

My best thought for proactive archiving is based on a business model, 
involving giving an "archive" some value.   My "print everything and 
make a book" suggestion was only partly whimsical. If a book exists, 
someone will sell it (priced low enough so anyone can afford it).   The 
bookseller(s) du jour (e.g., Amazon) will treat it as part of their 
SKUs, and presumably preserve it as long as there is interest in it 
(i.e., buyers).   Even if a company shifts priorities or goes out of 
business, their "assets" remain, including the books they've been 
selling, and will likely be sold to someone else.   This has been 
happening with various kinds of media, e.g., movies, TV shows, 
recordings, etc.  When no one cares about the book any more it may 
disappear.  But if no one cares...

My second choice is to simply transmit all of the content into deep 
space.   It will travel forever at the speed of light, and can preserve 
an enormous amount of content.  Future researchers will be able to 
access the archive once they've solved the technical problem of how to 
catch up to it, and capture and decode the contents. Just like today's 
researchers are now able to look at what happened just after the Big 
Bang by looking at the signals that are just getting here using the JWST.

Jack Haverty

On 2/16/23 10:08, Joe Touch via Internet-history wrote:
> FYI, even cemeteries don’t preserve things “forever”. Plots are leased, not sold. Libraries disappear, universities dissolve, and churches are sold and rebuilt.
>
> I don’t think this group will find a new solution.
>
>> On Feb 16, 2023, at 10:03 AM, vinton cerf via Internet-history <internet-history at elists.isoc.org> wrote:
>>
>> John,
>> thanks for your thoughtful intervention. Your conclusion leads me to wonder
>> about business models that might produce the desired resilience.
>> Preservation by accident is not a plan and so often that's all that we
>> achieve.
>>
>> v
>>
>>
>>> On Thu, Feb 16, 2023 at 1:08 AM John Gilmore <gnu at toad.com> wrote:
>>>
>>> vinton cerf via Internet-history <internet-history at elists.isoc.org> wrote:
>>>> wow thanks for this lengthy history. So many familiar names. I sure hope
>>>> this mailing list does get archived properly as it contains a wealth of
>>>> information it would be hard to re-create in the future.
>>> Besides the internet-history mailing list's archives here:
>>>
>>>   https://elists.isoc.org/pipermail/internet-history/
>>>
>>> I have also been using an Archive-It account to make periodic copies of
>>> that web site in the Internet Archive here:
>>>
>>>
>>> https://wayback.archive-it.org/15071/20230114211520/https://elists.isoc.org/pipermail/internet-history/
>>>
>>> These are accessible via the Wayback Machine as well as via
>>> the page for this collection, here:
>>>
>>>   Internet and Unix History
>>>   https://archive-it.org/collections/15071
>>>
>>> As you can see there, it's set up to periodically scan various other URLs;
>>> please suggest others that are of historic interest, and I can add them.
>>>
>>> FYI, the Wayback Machine does not necessarily get deep copies of every
>>> web site.  Their focus is on breadth, so if a website has a thousand web
>>> pages, perhaps they will get 50 or 100 of them in each crawl.  Also,
>>> there are enough websites which are designed to "trap" a web crawler and
>>> cause it to waste a lot of its time, storage and bandwidth uselessly, so
>>> the main crawler doesn't keep going.  So, if there's a deep collection
>>> (for example, ALL the source code to reproduce a popular Linux
>>> distribution) that you think is worth saving for the future,
>>> Archive-It.org is one way to get it saved for posterity.
>>>
>>> Also FYI, the Internet Archive is an example of the philosophy of
>>> putting all your eggs in one basket and watching that basket intently.
>>> The (untested) theory is that the collection will be too valuable to let
>>> it fall apart later.  A distributed system (like LOCKSS for example)
>>> would provide higher likelihood of stuff surviving the next hundred,
>>> thousand or 10,000 years.  The Archive is keeping two or three
>>> replicated copies of each item they have, and copying them forward onto
>>> newer and fatter drives, but all of them are under the same
>>> administration and owned by the same nonprofit.  Brewster Kahle is the
>>> sparkplug and the main funding source; control of that nonprofit will be
>>> in the hands of a small number of probably-less-competent-and-virtuous
>>> people after Brewster is no more.  Hell, during the pandemic, ONE GUY
>>> was responsible for swapping out failed disk drives before the only
>>> second copy of a failed drive happened to also fail.  Bit-rot sets in
>>> quickly, and five or ten years of merely incompetent system
>>> administration would make a shambles of this finely tuned machine.  Not
>>> to mention the possibility of malicious intrusion, particularly by
>>> people or governments who want to destroy the historical evidence of
>>> whatever bad stuff they've been up to.
>>>
>>> It would be better if there were ten Internet Archive nonprofits (or
>>> government agencies) scattered around the planet.  Each of them would
>>> ideally be taking copies of each others' full holdings, as well as doing
>>> their own crawls of the live web, and scanning in whatever physical
>>> cultural works they are particularly interested in.  Anybody know any
>>> Internet billionaires or spy-agency VP's who want to catalyze and endow
>>> a second Internet Archive?  The big advantage for spy agencies is
>>> stealth; you can look anywhere you want in your own archive, and nobody
>>> knows where you are looking.
>>>
>>>         John
>>>
>>>
>> -- 
>> Internet-history mailing list
>> Internet-history at elists.isoc.org
>> https://elists.isoc.org/mailman/listinfo/internet-history