[ih] Archiving Internet history

odlyzko at umn.edu odlyzko at umn.edu
Thu Feb 16 18:46:53 PST 2023


In principle that is true.  But there are practical problems,
often involving copyrights.  How do you make sure that the
publisher stays in business, or, if it goes out of business,
that the digitized book is taken over by another publisher?
That issue has been a bane of many people trying to assemble
collected papers of eminent folks, in that copyright owners
could not be found.  It should not be so much of a problem
online, but it still exists.

This is, of course, aggravated by the fact that publishers
like to keep control of their books, and not have lots of
copies floating around.

And libraries are also not an infallible fallback position.
First of all, libraries did not collect everything.  And now,
with the lowered demand for hard copies, they are sending
their physical volumes to offsite storage sites, or just
recycling them.  There is a cooperative movement among
libraries to ensure that enough physical copies survive
to serve the world (this is primarily of concern for items
still in copyright).  If you look at WorldCat, it lists
libraries that have a copy of a book, and often notes how
many have committed to keep a copy.

Andrew


 	------------------------------

 	Message: 4
 	Date: Thu, 16 Feb 2023 18:27:05 -0800
 	From: Jack Haverty <jack at 3kitty.org>
 	To: internet-history at elists.isoc.org
 	Subject: Re: [ih] Archiving Internet history
 	Message-ID: <aac1b0fc-f350-d0cd-51b5-47fc52ee0818 at 3kitty.org>
 	Content-Type: text/plain; charset=UTF-8; format=flowed

 	Also, it was traditionally expensive to set up a printing run for a
 	book, so economics required a large demand to justify the cost. Saving a
 	book today in digital form takes a few megabytes of storage, orders of
 	magnitude less cost.?? Today's "print-on-demand" means it's a lot more
 	reasonable to keep digitized books available to be printed should anyone
 	want a copy.


 	On 2/16/23 17:13, John Day via Internet-history wrote:
 	> ?out of print? often means they are more expensive. ;-)  or at least they are in a library for free.
 	>
 	> It doesn?t mean they don?t exist, right?
 	>
 	>> On Feb 16, 2023, at 18:13, vinton cerf via Internet-history <internet-history at elists.isoc.org> wrote:
 	>>
 	>> re: books - you have surely heard "out of print" for economic/demand
 	>> reasons...
 	>> v
 	>>
 	>>
 	>> On Thu, Feb 16, 2023 at 1:37 PM Jack Haverty via Internet-history <
 	>> internet-history at elists.isoc.org> wrote:
 	>>
 	>>> My best thought for proactive archiving is based on a business model,
 	>>> involving giving an "archive" some value.   My "print everything and
 	>>> make a book" suggestion was only partly whimsical. If a book exists,
 	>>> someone will sell it (priced low enough so anyone can afford it).   The
 	>>> bookseller(s) du jour (e.g., Amazon) will treat it as part of their
 	>>> SKUs, and presumably preserve it as long as there is interest in it
 	>>> (i.e., buyers).   Even if a company shifts priorities or goes out of
 	>>> business, their "assets" remain, including the books they've been
 	>>> selling, and will likely be sold to someone else.   This has been
 	>>> happening with various kinds of media, e.g., movies, TV shows,
 	>>> recordings, etc.  When no one cares about the book any more it may
 	>>> disappear.  But if no one cares...
 	>>>
 	>>> My second choice is to simply transmit all of the content into deep
 	>>> space.   It will travel forever at the speed of light, and can preserve
 	>>> an enormous amount of content.  Future researchers will be able to
 	>>> access the archive once they've solved the technical problem of how to
 	>>> catch up to it, and capture and decode the contents. Just like today's
 	>>> researchers are now able to look at what happened just after the Big
 	>>> Bang by looking at the signals that are just getting here using the JWST.
 	>>>
 	>>> Jack Haverty
 	>>>
 	>>>
 	>>> On 2/16/23 10:08, Joe Touch via Internet-history wrote:
 	>>>> FYI, even cemeteries don?t preserve things ?forever?. Plots are leased,
 	>>> not sold. Libraries disappear, universities dissolve, and churches are sold
 	>>> and rebuilt.
 	>>>> I don?t think this group will find a new solution.
 	>>>>
 	>>>>> On Feb 16, 2023, at 10:03 AM, vinton cerf via Internet-history <
 	>>> internet-history at elists.isoc.org> wrote:
 	>>>>> ?John,
 	>>>>> thanks for your thoughtful intervention. Your conclusion leads me to
 	>>> wonder
 	>>>>> about business models that might produce the desired resilience.
 	>>>>> Preservation by accident is not a plan and so often that's all that we
 	>>>>> achieve.
 	>>>>>
 	>>>>> v
 	>>>>>
 	>>>>>
 	>>>>>> On Thu, Feb 16, 2023 at 1:08 AM John Gilmore <gnu at toad.com> wrote:
 	>>>>>>
 	>>>>>> vinton cerf via Internet-history <internet-history at elists.isoc.org>
 	>>> wrote:
 	>>>>>>> wow thanks for this lengthy history. So many familiar names. I sure
 	>>> hope
 	>>>>>>> this mailing list does get archived properly as it contains a wealth
 	>>> of
 	>>>>>>> information it would be hard to re-create in the future.
 	>>>>>> Besides the internet-history mailing list's archives here:
 	>>>>>>
 	>>>>>>   https://elists.isoc.org/pipermail/internet-history/
 	>>>>>>
 	>>>>>> I have also been using an Archive-It account to make periodic copies of
 	>>>>>> that web site in the Internet Archive here:
 	>>>>>>
 	>>>>>>
 	>>>>>>
 	>>> https://wayback.archive-it.org/15071/20230114211520/https://elists.isoc.org/pipermail/internet-history/
 	>>>>>> These are accessible via the Wayback Machine as well as via
 	>>>>>> the page for this collection, here:
 	>>>>>>
 	>>>>>>   Internet and Unix History
 	>>>>>>   https://archive-it.org/collections/15071
 	>>>>>>
 	>>>>>> As you can see there, it's set up to periodically scan various other
 	>>> URLs;
 	>>>>>> please suggest others that are of historic interest, and I can add
 	>>> them.
 	>>>>>> FYI, the Wayback Machine does not necessarily get deep copies of every
 	>>>>>> web site.  Their focus is on breadth, so if a website has a thousand
 	>>> web
 	>>>>>> pages, perhaps they will get 50 or 100 of them in each crawl.  Also,
 	>>>>>> there are enough websites which are designed to "trap" a web crawler
 	>>> and
 	>>>>>> cause it to waste a lot of its time, storage and bandwidth uselessly,
 	>>> so
 	>>>>>> the main crawler doesn't keep going.  So, if there's a deep collection
 	>>>>>> (for example, ALL the source code to reproduce a popular Linux
 	>>>>>> distribution) that you think is worth saving for the future,
 	>>>>>> Archive-It.org is one way to get it saved for posterity.
 	>>>>>>
 	>>>>>> Also FYI, the Internet Archive is an example of the philosophy of
 	>>>>>> putting all your eggs in one basket and watching that basket intently.
 	>>>>>> The (untested) theory is that the collection will be too valuable to
 	>>> let
 	>>>>>> it fall apart later.  A distributed system (like LOCKSS for example)
 	>>>>>> would provide higher likelihood of stuff surviving the next hundred,
 	>>>>>> thousand or 10,000 years.  The Archive is keeping two or three
 	>>>>>> replicated copies of each item they have, and copying them forward onto
 	>>>>>> newer and fatter drives, but all of them are under the same
 	>>>>>> administration and owned by the same nonprofit.  Brewster Kahle is the
 	>>>>>> sparkplug and the main funding source; control of that nonprofit will
 	>>> be
 	>>>>>> in the hands of a small number of probably-less-competent-and-virtuous
 	>>>>>> people after Brewster is no more.  Hell, during the pandemic, ONE GUY
 	>>>>>> was responsible for swapping out failed disk drives before the only
 	>>>>>> second copy of a failed drive happened to also fail.  Bit-rot sets in
 	>>>>>> quickly, and five or ten years of merely incompetent system
 	>>>>>> administration would make a shambles of this finely tuned machine.  Not
 	>>>>>> to mention the possibility of malicious intrusion, particularly by
 	>>>>>> people or governments who want to destroy the historical evidence of
 	>>>>>> whatever bad stuff they've been up to.
 	>>>>>>
 	>>>>>> It would be better if there were ten Internet Archive nonprofits (or
 	>>>>>> government agencies) scattered around the planet.  Each of them would
 	>>>>>> ideally be taking copies of each others' full holdings, as well as
 	>>> doing
 	>>>>>> their own crawls of the live web, and scanning in whatever physical
 	>>>>>> cultural works they are particularly interested in.  Anybody know any
 	>>>>>> Internet billionaires or spy-agency VP's who want to catalyze and endow
 	>>>>>> a second Internet Archive?  The big advantage for spy agencies is
 	>>>>>> stealth; you can look anywhere you want in your own archive, and nobody
 	>>>>>> knows where you are looking.
 	>>>>>>
 	>>>>>>         John
 	>>>>>>
 	>>>>>>
 	>>>>> --
 	>>>>> Internet-history mailing list
 	>>>>> Internet-history at elists.isoc.org
 	>>>>> https://elists.isoc.org/mailman/listinfo/internet-history
 	>>> --
 	>>> Internet-history mailing list
 	>>> Internet-history at elists.isoc.org
 	>>> https://elists.isoc.org/mailman/listinfo/internet-history
 	>>>
 	>> --
 	>> Internet-history mailing list
 	>> Internet-history at elists.isoc.org
 	>> https://elists.isoc.org/mailman/listinfo/internet-history





More information about the Internet-history mailing list