[ih] Link rot (was: Museum archiving (was: Re: IENs))

vinton cerf vgcerf at gmail.com
Sun May 9 10:10:04 PDT 2021


anywhere you see "DOI:..." it is in use.
yeah, DOA is an unfortunate acronym. Bob Kahn has been developing since
1988.
There is a non-profit organization that contracts with CNRI to handle
operations.

v

On Sun, May 9, 2021 at 1:07 PM Jack Haverty <jack at 3kitty.org> wrote:

> Interesting, hadn't seen DOA before.  I've learned to be skeptical of
> "Architectures".   Why did they pick such an acronym - when I see "DOA" I
> think of something quite different.  Has this been implemented?  Is it
> being used?  How much stuff is in it?  Or is DOA DOA?
>
> IMHO, there are many ways to create the technology, but the hard part is
> creating the administrative mechanisms that will keep it functioning
> continuously, and getting everybody to use them.
>
> The "PURL" approach I was pushing with W3C (seemed like a likely
> administrator) was really a resurrection of an old idea that was part of
> Licklider's "galactic network" vision.   When I was in Lick's group at MIT
> in the early 70s, we pushed very hard to implement an archival architecture
> in the emerging standards for "messaging" (what we now call email).
>
> Part of that was doing battle in the 1970s Header Wars that determined
> what email headers should contain.  If you look at an email today, you'll
> usually find one of the artifacts of those wars -- the "Message-ID:"
> field.  I fought very hard to get that into the standard.
>
> The idea was that a Message-ID was simply a unique string that identified
> a specific message.   In Lick's vision, messages were simply documents that
> got created and assigned such an ID as they proceeded down their own path
> through the galactic network.  Other messages might reference them, e.g.,
> as replies, forwards, and other actions take place - possibly over years
> and creating complex tangles of messages.  So basically anything in digital
> form could be a "message".
>
> For longevity, the idea was that anyone encountering a message could
> signal to their system that the message should be Archived - i.e., saved
> permanently.   So an ephemeral note ("Let's go to lunch.") would disappear,
> but anything considered important could be Archived by anyone who thought
> it worthwhile.
>
> That was the Architecture.   In the real world, we implemented our "mail
> system" to use the Datacomputer as the Archive.  Anyone writing or
> receiving a message could tell the system to Archive it, and the mail
> daemon would place a copy of that message into the Datacomputer, which was
> conveniently available on the ARPANET.   The message's Message-ID uniquely
> identified it for future retrieval.  In modern database terminology, the
> Message-ID was the Primary Key.
>
> AFAIK, if anyone can find artifacts from the Datacomputer, you would find
> a collection of ancient email messages that the MIT mail system put there
> back in the 70s.   Sadly, the notion of the galactic network (now called
> The Internet) lost the concept of having a Persistent Store as an integral
> part.   The vision was that there would be Datacomputers scattered around
> the net, run 24x7 as reliable warehouses for information.
>
> If you look at Vint's message:
>
> [image: Header with Message-IDs]
>
> you'll see the Message-ID of his message (you may have to enable "full
> headers" or something like that to see them), and a reference to the
> Message-ID of the message to which he was replying and several others.
>
> Curiously, those Message-IDs are hot-links, at least in the software I'm
> using (Thunderbird on Ubuntu).   But when I click on one to go to that
> referenced message, all it does is put up a spinner graphic and refuse to
> do anything else so I have to kill the program.
>
> I guess the Datacomputer is down today.....
>
> /Jack Haverty
>
>
> On 5/9/21 7:24 AM, vinton cerf wrote:
>
> see Digital Object Architecture (for DIgital Object Identifiers = DOI)
>
> v
>
>
> On Sat, May 8, 2021 at 5:25 PM Jack Haverty via Internet-history <
> internet-history at elists.isoc.org> wrote:
>
>> Sometimes you can find lost URLs' contents by looking at archive.org if
>> you can remember the relevant URL.
>>
>> At one point in the '90s while I was on the W3C I lobbied for the
>> introduction of a new form of URL - a "PURL" or permanent URL.   PURLs
>> would have their content cached in a permanent database (akin to
>> archive.org), so that if/when the URL ever disappeared the last content
>> would still be available.  Someone creating web content who wanted it to
>> have longevity would submit it as a PURL to that database, which would
>> copy its contents and periodically check for changes.   The backend
>> storage would be maintained and managed by W3C.
>>
>> Looking now from 2021, archive.org does provide a very similar service
>> although I'm not sure how comprehensive it is.  Also, it's not clear
>> whether or not popular search engines would find something that has
>> disappeared as a URL, but is still present in the archive.org
>> repository.  And like most other repositories, the survival of
>> archive.org depends on some stream of continuous life support (funding,
>> etc.)
>>
>> I've occasionally caught some corporation's misdeeds by such
>> retrievals.  E.g., when a manufacturer promises that a product will do
>> something, it's no longer sufficient for them to just change the
>> product's web page to erase all traces of the promise.  Chances are it's
>> in the archive still and they have a tough time claiming they never said
>> that.
>>
>> /Jack
>>
>> On 5/8/21 1:45 PM, John Day via Internet-history wrote:
>> > Couldn’t agree more.  A URL as a citation is practically useless. The
>> Internet is not much of an archive.
>> >
>> >> On May 8, 2021, at 16:17, Brian E Carpenter <
>> brian.e.carpenter at gmail.com> wrote:
>> >>
>> >> On 09-May-21 02:44, Ole Jacobsen via Internet-history wrote:
>> >> ...
>> >>> I'll just note that there used to be a direct URL for ConneXions in
>> the CBI
>> >>> hosted publications archive, but that has recently changed. Another
>> peril
>> >>> of online museums.
>> >> Indeed, and I wonder whether this august body could somehow try to
>> change the thinking of archivists about that problem. Just over the last
>> year or so, I've been digging in archives quite a bit (for
>> doi.org/10.1109/MAHC.2020.2990647 and a forthcoming follow-up) and even
>> in that time, some URLs have rotted, which makes it annoying to go back and
>> follow up a new detail, and invalidates published citations. Over the
>> longer term, say 10 years, even more links rot and search results become
>> misleading or useless.
>> >>
>> >> (Or maybe that's a topic for the SIGCIS list.)
>> >>
>> >> Regards
>> >>   Brian Carpenter
>>
>> --
>> Internet-history mailing list
>> Internet-history at elists.isoc.org
>> https://elists.isoc.org/mailman/listinfo/internet-history
>>
>
>



More information about the Internet-history mailing list