[ih] Link rot (was: Museum archiving (was: Re: IENs))

Jack Haverty jack at 3kitty.org
Sun May 9 11:25:58 PDT 2021


I don't remember ever seeing DOI: but I'll watch for it.  To me, the
issue with DOA is the same as the other systems like Datacomputer,
archive.org, etc.  - what happens when the supporting organization
disappears.

IMHO, a viable approach to longevity might be to (somehow) make the
Storage of data as endemic to The Internet as the Movement of data is
today.  I don't worry about the future ability to Move data.  That
capability has become endemic to human society, arguably now as
important as air and water.   Whatever happens with router vendors,
network operators, fiber optic technology, et al ... something will
replace it to continue human ability to Move data.   That capability is
endemic to The Internet, which itself seems endemic now to Humanity.

So, what if the same were true of Storage of data?  Perhaps when you're
deciding what ISP to sign up with, instead of just looking at their
cost, reliability, bandwidth, and latency, you would also look at their
Storage capability.   ISPs are Service Providers.   Moving data is a
Service.  Storage of data could also be a Service.   A successful ISP
might offer not only Gigabit/sec connectivity to anywhere on the planet
(coming soon - Mars!).   It might also offer access to the collected
annals of humanity (basic service capped at 2 billion retrievals per
month, unlimited available at higher cost).

Today, that kind of Move service is implemented by a federation of
uncountable ISPs, all somehow interconnecting and cooperating (mostly)
to provide such a pervasive service.   Could that approach also work to
provide Storage as a permanent endemic part of The Internet?   Not
implemented by a single organization, corporation, or consortium -- but
a core component of the fabric of The Internet.   That was the
assumption behind those efforts to standardize Message-ID.

Meanwhile, for longevity very long term, hundreds or thousands of years,
I think we're allowed to assume that technology will continue to
advance.  There's even rumblings in the scientific community today that
faster-than-light travel may actually be possible (as well as "Beam me
up, Scotty!").  Not  today, but maybe someday.

So, one way to archive all of our digital stuff is simply --- send it
into Space.   Using a powerful radio transmitter, just transmit everything.

Space is big.   BIG.   A few thousand light-years can store an
incredible amount of data.  Sure, we don't know how to go get that old
data now, but in the future?

We're (humanity) is also eager today to find evidence of alien
civilizations, studiously searching the electromagnetic spectrum for
data from distant planets.   Perhaps we don't have the technology yet to
extract their data from the cosmic noise.  But in the future?

So, if we transmit all the IENs and RFCs (and everything else), they'll
likely eventually be accessible to future researchers.  Electromagnetic
radiation seems to have incredible longevity.   Current scientists are
looking at the history back to the Big Bang, learning what happened then
by analyzing the current cosmic background radiation which is still
around after billions of years.   Sure, we don't know how to find the
primordial equivalent to IENs of some civlization of that era.   But in
the future?

The Archives of Humanity are already Out There.  Complete collections of
classic Human Artifact materials such as The Milton Berle Show and I
Love Lucy are currently accessible at about 70 lightyears out.   Sure,
we don't know how to get them now.   But in the future?

/Jack
(it's been that kind of Sunday morning...)

PS - there are possibly downsides to sending all our data into space.  
But that horse has left the barn.  Google "The Dark Forest"  

On 5/9/21 10:10 AM, vinton cerf wrote:
> anywhere you see "DOI:..." it is in use. 
> yeah, DOA is an unfortunate acronym. Bob Kahn has been developing
> since 1988.
> There is a non-profit organization that contracts with CNRI to handle
> operations. 
>
> v
>
> On Sun, May 9, 2021 at 1:07 PM Jack Haverty <jack at 3kitty.org
> <mailto:jack at 3kitty.org>> wrote:
>
>     Interesting, hadn't seen DOA before.  I've learned to be skeptical
>     of "Architectures".   Why did they pick such an acronym - when I
>     see "DOA" I think of something quite different.  Has this been
>     implemented?  Is it being used?  How much stuff is in it?  Or is
>     DOA DOA?
>
>     IMHO, there are many ways to create the technology, but the hard
>     part is creating the administrative mechanisms that will keep it
>     functioning continuously, and getting everybody to use them.
>
>     The "PURL" approach I was pushing with W3C (seemed like a likely
>     administrator) was really a resurrection of an old idea that was
>     part of Licklider's "galactic network" vision.   When I was in
>     Lick's group at MIT in the early 70s, we pushed very hard to
>     implement an archival architecture in the emerging standards for
>     "messaging" (what we now call email).
>
>     Part of that was doing battle in the 1970s Header Wars that
>     determined what email headers should contain.  If you look at an
>     email today, you'll usually find one of the artifacts of those
>     wars -- the "Message-ID:" field.  I fought very hard to get that
>     into the standard.
>
>     The idea was that a Message-ID was simply a unique string that
>     identified a specific message.   In Lick's vision, messages were
>     simply documents that got created and assigned such an ID as they
>     proceeded down their own path through the galactic network.  Other
>     messages might reference them, e.g., as replies, forwards, and
>     other actions take place - possibly over years and creating
>     complex tangles of messages.  So basically anything in digital
>     form could be a "message".
>
>     For longevity, the idea was that anyone encountering a message
>     could signal to their system that the message should be Archived -
>     i.e., saved permanently.   So an ephemeral note ("Let's go to
>     lunch.") would disappear, but anything considered important could
>     be Archived by anyone who thought it worthwhile.
>
>     That was the Architecture.   In the real world, we implemented our
>     "mail system" to use the Datacomputer as the Archive.  Anyone
>     writing or receiving a message could tell the system to Archive
>     it, and the mail daemon would place a copy of that message into
>     the Datacomputer, which was conveniently available on the
>     ARPANET.   The message's Message-ID uniquely identified it for
>     future retrieval.  In modern database terminology, the Message-ID
>     was the Primary Key.
>
>     AFAIK, if anyone can find artifacts from the Datacomputer, you
>     would find a collection of ancient email messages that the MIT
>     mail system put there back in the 70s.   Sadly, the notion of the
>     galactic network (now called The Internet) lost the concept of
>     having a Persistent Store as an integral part.   The vision was
>     that there would be Datacomputers scattered around the net, run
>     24x7 as reliable warehouses for information.
>
>     If you look at Vint's message:
>
>     Header with Message-IDs
>
>     you'll see the Message-ID of his message (you may have to enable
>     "full headers" or something like that to see them), and a
>     reference to the Message-ID of the message to which he was
>     replying and several others.
>
>     Curiously, those Message-IDs are hot-links, at least in the
>     software I'm using (Thunderbird on Ubuntu).   But when I click on
>     one to go to that referenced message, all it does is put up a
>     spinner graphic and refuse to do anything else so I have to kill
>     the program.
>
>     I guess the Datacomputer is down today.....
>
>     /Jack Haverty
>
>
>     On 5/9/21 7:24 AM, vinton cerf wrote:
>>     see Digital Object Architecture (for DIgital Object Identifiers =
>>     DOI)
>>
>>     v
>>
>>
>>     On Sat, May 8, 2021 at 5:25 PM Jack Haverty via Internet-history
>>     <internet-history at elists.isoc.org
>>     <mailto:internet-history at elists.isoc.org>> wrote:
>>
>>         Sometimes you can find lost URLs' contents by looking at
>>         archive.org <http://archive.org> if
>>         you can remember the relevant URL.
>>
>>         At one point in the '90s while I was on the W3C I lobbied for the
>>         introduction of a new form of URL - a "PURL" or permanent
>>         URL.   PURLs
>>         would have their content cached in a permanent database (akin to
>>         archive.org <http://archive.org>), so that if/when the URL
>>         ever disappeared the last content
>>         would still be available.  Someone creating web content who
>>         wanted it to
>>         have longevity would submit it as a PURL to that database,
>>         which would
>>         copy its contents and periodically check for changes.   The
>>         backend
>>         storage would be maintained and managed by W3C.
>>
>>         Looking now from 2021, archive.org <http://archive.org> does
>>         provide a very similar service
>>         although I'm not sure how comprehensive it is.  Also, it's
>>         not clear
>>         whether or not popular search engines would find something
>>         that has
>>         disappeared as a URL, but is still present in the archive.org
>>         <http://archive.org>
>>         repository.  And like most other repositories, the survival of
>>         archive.org <http://archive.org> depends on some stream of
>>         continuous life support (funding,
>>         etc.)
>>
>>         I've occasionally caught some corporation's misdeeds by such
>>         retrievals.  E.g., when a manufacturer promises that a
>>         product will do
>>         something, it's no longer sufficient for them to just change the
>>         product's web page to erase all traces of the promise. 
>>         Chances are it's
>>         in the archive still and they have a tough time claiming they
>>         never said
>>         that.
>>
>>         /Jack
>>
>>         On 5/8/21 1:45 PM, John Day via Internet-history wrote:
>>         > Couldn’t agree more.  A URL as a citation is practically
>>         useless. The Internet is not much of an archive.
>>         >
>>         >> On May 8, 2021, at 16:17, Brian E Carpenter
>>         <brian.e.carpenter at gmail.com
>>         <mailto:brian.e.carpenter at gmail.com>> wrote:
>>         >>
>>         >> On 09-May-21 02:44, Ole Jacobsen via Internet-history wrote:
>>         >> ...
>>         >>> I'll just note that there used to be a direct URL for
>>         ConneXions in the CBI
>>         >>> hosted publications archive, but that has recently
>>         changed. Another peril
>>         >>> of online museums.
>>         >> Indeed, and I wonder whether this august body could
>>         somehow try to change the thinking of archivists about that
>>         problem. Just over the last year or so, I've been digging in
>>         archives quite a bit (for  doi.org/10.1109/MAHC.2020.2990647
>>         <http://doi.org/10.1109/MAHC.2020.2990647> and a forthcoming
>>         follow-up) and even in that time, some URLs have rotted,
>>         which makes it annoying to go back and follow up a new
>>         detail, and invalidates published citations. Over the longer
>>         term, say 10 years, even more links rot and search results
>>         become misleading or useless.
>>         >>
>>         >> (Or maybe that's a topic for the SIGCIS list.)
>>         >>
>>         >> Regards
>>         >>   Brian Carpenter
>>
>>         -- 
>>         Internet-history mailing list
>>         Internet-history at elists.isoc.org
>>         <mailto:Internet-history at elists.isoc.org>
>>         https://elists.isoc.org/mailman/listinfo/internet-history
>>         <https://elists.isoc.org/mailman/listinfo/internet-history>
>>
>




More information about the Internet-history mailing list