[ih] Archiving Internet history

Jack Haverty jack at 3kitty.org
Thu Feb 16 18:27:05 PST 2023


Also, it was traditionally expensive to set up a printing run for a 
book, so economics required a large demand to justify the cost. Saving a 
book today in digital form takes a few megabytes of storage, orders of 
magnitude less cost.   Today's "print-on-demand" means it's a lot more 
reasonable to keep digitized books available to be printed should anyone 
want a copy.


On 2/16/23 17:13, John Day via Internet-history wrote:
> “out of print” often means they are more expensive. ;-)  or at least they are in a library for free.
>
> It doesn’t mean they don’t exist, right?
>
>> On Feb 16, 2023, at 18:13, vinton cerf via Internet-history <internet-history at elists.isoc.org> wrote:
>>
>> re: books - you have surely heard "out of print" for economic/demand
>> reasons...
>> v
>>
>>
>> On Thu, Feb 16, 2023 at 1:37 PM Jack Haverty via Internet-history <
>> internet-history at elists.isoc.org> wrote:
>>
>>> My best thought for proactive archiving is based on a business model,
>>> involving giving an "archive" some value.   My "print everything and
>>> make a book" suggestion was only partly whimsical. If a book exists,
>>> someone will sell it (priced low enough so anyone can afford it).   The
>>> bookseller(s) du jour (e.g., Amazon) will treat it as part of their
>>> SKUs, and presumably preserve it as long as there is interest in it
>>> (i.e., buyers).   Even if a company shifts priorities or goes out of
>>> business, their "assets" remain, including the books they've been
>>> selling, and will likely be sold to someone else.   This has been
>>> happening with various kinds of media, e.g., movies, TV shows,
>>> recordings, etc.  When no one cares about the book any more it may
>>> disappear.  But if no one cares...
>>>
>>> My second choice is to simply transmit all of the content into deep
>>> space.   It will travel forever at the speed of light, and can preserve
>>> an enormous amount of content.  Future researchers will be able to
>>> access the archive once they've solved the technical problem of how to
>>> catch up to it, and capture and decode the contents. Just like today's
>>> researchers are now able to look at what happened just after the Big
>>> Bang by looking at the signals that are just getting here using the JWST.
>>>
>>> Jack Haverty
>>>
>>>
>>> On 2/16/23 10:08, Joe Touch via Internet-history wrote:
>>>> FYI, even cemeteries don’t preserve things “forever”. Plots are leased,
>>> not sold. Libraries disappear, universities dissolve, and churches are sold
>>> and rebuilt.
>>>> I don’t think this group will find a new solution.
>>>>
>>>>> On Feb 16, 2023, at 10:03 AM, vinton cerf via Internet-history <
>>> internet-history at elists.isoc.org> wrote:
>>>>> John,
>>>>> thanks for your thoughtful intervention. Your conclusion leads me to
>>> wonder
>>>>> about business models that might produce the desired resilience.
>>>>> Preservation by accident is not a plan and so often that's all that we
>>>>> achieve.
>>>>>
>>>>> v
>>>>>
>>>>>
>>>>>> On Thu, Feb 16, 2023 at 1:08 AM John Gilmore <gnu at toad.com> wrote:
>>>>>>
>>>>>> vinton cerf via Internet-history <internet-history at elists.isoc.org>
>>> wrote:
>>>>>>> wow thanks for this lengthy history. So many familiar names. I sure
>>> hope
>>>>>>> this mailing list does get archived properly as it contains a wealth
>>> of
>>>>>>> information it would be hard to re-create in the future.
>>>>>> Besides the internet-history mailing list's archives here:
>>>>>>
>>>>>>   https://elists.isoc.org/pipermail/internet-history/
>>>>>>
>>>>>> I have also been using an Archive-It account to make periodic copies of
>>>>>> that web site in the Internet Archive here:
>>>>>>
>>>>>>
>>>>>>
>>> https://wayback.archive-it.org/15071/20230114211520/https://elists.isoc.org/pipermail/internet-history/
>>>>>> These are accessible via the Wayback Machine as well as via
>>>>>> the page for this collection, here:
>>>>>>
>>>>>>   Internet and Unix History
>>>>>>   https://archive-it.org/collections/15071
>>>>>>
>>>>>> As you can see there, it's set up to periodically scan various other
>>> URLs;
>>>>>> please suggest others that are of historic interest, and I can add
>>> them.
>>>>>> FYI, the Wayback Machine does not necessarily get deep copies of every
>>>>>> web site.  Their focus is on breadth, so if a website has a thousand
>>> web
>>>>>> pages, perhaps they will get 50 or 100 of them in each crawl.  Also,
>>>>>> there are enough websites which are designed to "trap" a web crawler
>>> and
>>>>>> cause it to waste a lot of its time, storage and bandwidth uselessly,
>>> so
>>>>>> the main crawler doesn't keep going.  So, if there's a deep collection
>>>>>> (for example, ALL the source code to reproduce a popular Linux
>>>>>> distribution) that you think is worth saving for the future,
>>>>>> Archive-It.org is one way to get it saved for posterity.
>>>>>>
>>>>>> Also FYI, the Internet Archive is an example of the philosophy of
>>>>>> putting all your eggs in one basket and watching that basket intently.
>>>>>> The (untested) theory is that the collection will be too valuable to
>>> let
>>>>>> it fall apart later.  A distributed system (like LOCKSS for example)
>>>>>> would provide higher likelihood of stuff surviving the next hundred,
>>>>>> thousand or 10,000 years.  The Archive is keeping two or three
>>>>>> replicated copies of each item they have, and copying them forward onto
>>>>>> newer and fatter drives, but all of them are under the same
>>>>>> administration and owned by the same nonprofit.  Brewster Kahle is the
>>>>>> sparkplug and the main funding source; control of that nonprofit will
>>> be
>>>>>> in the hands of a small number of probably-less-competent-and-virtuous
>>>>>> people after Brewster is no more.  Hell, during the pandemic, ONE GUY
>>>>>> was responsible for swapping out failed disk drives before the only
>>>>>> second copy of a failed drive happened to also fail.  Bit-rot sets in
>>>>>> quickly, and five or ten years of merely incompetent system
>>>>>> administration would make a shambles of this finely tuned machine.  Not
>>>>>> to mention the possibility of malicious intrusion, particularly by
>>>>>> people or governments who want to destroy the historical evidence of
>>>>>> whatever bad stuff they've been up to.
>>>>>>
>>>>>> It would be better if there were ten Internet Archive nonprofits (or
>>>>>> government agencies) scattered around the planet.  Each of them would
>>>>>> ideally be taking copies of each others' full holdings, as well as
>>> doing
>>>>>> their own crawls of the live web, and scanning in whatever physical
>>>>>> cultural works they are particularly interested in.  Anybody know any
>>>>>> Internet billionaires or spy-agency VP's who want to catalyze and endow
>>>>>> a second Internet Archive?  The big advantage for spy agencies is
>>>>>> stealth; you can look anywhere you want in your own archive, and nobody
>>>>>> knows where you are looking.
>>>>>>
>>>>>>         John
>>>>>>
>>>>>>
>>>>> --
>>>>> Internet-history mailing list
>>>>> Internet-history at elists.isoc.org
>>>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>> --
>>> Internet-history mailing list
>>> Internet-history at elists.isoc.org
>>> https://elists.isoc.org/mailman/listinfo/internet-history
>>>
>> -- 
>> Internet-history mailing list
>> Internet-history at elists.isoc.org
>> https://elists.isoc.org/mailman/listinfo/internet-history




More information about the Internet-history mailing list