[ih] Link rot (was: Museum archiving (was: Re: IENs))
Jack Haverty
jack at 3kitty.org
Sun May 9 10:07:29 PDT 2021
Interesting, hadn't seen DOA before. I've learned to be skeptical of
"Architectures". Why did they pick such an acronym - when I see "DOA"
I think of something quite different. Has this been implemented? Is it
being used? How much stuff is in it? Or is DOA DOA?
IMHO, there are many ways to create the technology, but the hard part is
creating the administrative mechanisms that will keep it functioning
continuously, and getting everybody to use them.
The "PURL" approach I was pushing with W3C (seemed like a likely
administrator) was really a resurrection of an old idea that was part of
Licklider's "galactic network" vision. When I was in Lick's group at
MIT in the early 70s, we pushed very hard to implement an archival
architecture in the emerging standards for "messaging" (what we now call
email).
Part of that was doing battle in the 1970s Header Wars that determined
what email headers should contain. If you look at an email today,
you'll usually find one of the artifacts of those wars -- the
"Message-ID:" field. I fought very hard to get that into the standard.
The idea was that a Message-ID was simply a unique string that
identified a specific message. In Lick's vision, messages were simply
documents that got created and assigned such an ID as they proceeded
down their own path through the galactic network. Other messages might
reference them, e.g., as replies, forwards, and other actions take place
- possibly over years and creating complex tangles of messages. So
basically anything in digital form could be a "message".
For longevity, the idea was that anyone encountering a message could
signal to their system that the message should be Archived - i.e., saved
permanently. So an ephemeral note ("Let's go to lunch.") would
disappear, but anything considered important could be Archived by anyone
who thought it worthwhile.
That was the Architecture. In the real world, we implemented our "mail
system" to use the Datacomputer as the Archive. Anyone writing or
receiving a message could tell the system to Archive it, and the mail
daemon would place a copy of that message into the Datacomputer, which
was conveniently available on the ARPANET. The message's Message-ID
uniquely identified it for future retrieval. In modern database
terminology, the Message-ID was the Primary Key.
AFAIK, if anyone can find artifacts from the Datacomputer, you would
find a collection of ancient email messages that the MIT mail system put
there back in the 70s. Sadly, the notion of the galactic network (now
called The Internet) lost the concept of having a Persistent Store as an
integral part. The vision was that there would be Datacomputers
scattered around the net, run 24x7 as reliable warehouses for information.
If you look at Vint's message:
Header with Message-IDs
you'll see the Message-ID of his message (you may have to enable "full
headers" or something like that to see them), and a reference to the
Message-ID of the message to which he was replying and several others.
Curiously, those Message-IDs are hot-links, at least in the software I'm
using (Thunderbird on Ubuntu). But when I click on one to go to that
referenced message, all it does is put up a spinner graphic and refuse
to do anything else so I have to kill the program.
I guess the Datacomputer is down today.....
/Jack Haverty
On 5/9/21 7:24 AM, vinton cerf wrote:
> see Digital Object Architecture (for DIgital Object Identifiers = DOI)
>
> v
>
>
> On Sat, May 8, 2021 at 5:25 PM Jack Haverty via Internet-history
> <internet-history at elists.isoc.org
> <mailto:internet-history at elists.isoc.org>> wrote:
>
> Sometimes you can find lost URLs' contents by looking at
> archive.org <http://archive.org> if
> you can remember the relevant URL.
>
> At one point in the '90s while I was on the W3C I lobbied for the
> introduction of a new form of URL - a "PURL" or permanent URL. PURLs
> would have their content cached in a permanent database (akin to
> archive.org <http://archive.org>), so that if/when the URL ever
> disappeared the last content
> would still be available. Someone creating web content who wanted
> it to
> have longevity would submit it as a PURL to that database, which would
> copy its contents and periodically check for changes. The backend
> storage would be maintained and managed by W3C.
>
> Looking now from 2021, archive.org <http://archive.org> does
> provide a very similar service
> although I'm not sure how comprehensive it is. Also, it's not clear
> whether or not popular search engines would find something that has
> disappeared as a URL, but is still present in the archive.org
> <http://archive.org>
> repository. And like most other repositories, the survival of
> archive.org <http://archive.org> depends on some stream of
> continuous life support (funding,
> etc.)
>
> I've occasionally caught some corporation's misdeeds by such
> retrievals. E.g., when a manufacturer promises that a product will do
> something, it's no longer sufficient for them to just change the
> product's web page to erase all traces of the promise. Chances
> are it's
> in the archive still and they have a tough time claiming they
> never said
> that.
>
> /Jack
>
> On 5/8/21 1:45 PM, John Day via Internet-history wrote:
> > Couldn’t agree more. A URL as a citation is practically
> useless. The Internet is not much of an archive.
> >
> >> On May 8, 2021, at 16:17, Brian E Carpenter
> <brian.e.carpenter at gmail.com <mailto:brian.e.carpenter at gmail.com>>
> wrote:
> >>
> >> On 09-May-21 02:44, Ole Jacobsen via Internet-history wrote:
> >> ...
> >>> I'll just note that there used to be a direct URL for
> ConneXions in the CBI
> >>> hosted publications archive, but that has recently changed.
> Another peril
> >>> of online museums.
> >> Indeed, and I wonder whether this august body could somehow try
> to change the thinking of archivists about that problem. Just over
> the last year or so, I've been digging in archives quite a bit
> (for doi.org/10.1109/MAHC.2020.2990647
> <http://doi.org/10.1109/MAHC.2020.2990647> and a forthcoming
> follow-up) and even in that time, some URLs have rotted, which
> makes it annoying to go back and follow up a new detail, and
> invalidates published citations. Over the longer term, say 10
> years, even more links rot and search results become misleading or
> useless.
> >>
> >> (Or maybe that's a topic for the SIGCIS list.)
> >>
> >> Regards
> >> Brian Carpenter
>
> --
> Internet-history mailing list
> Internet-history at elists.isoc.org
> <mailto:Internet-history at elists.isoc.org>
> https://elists.isoc.org/mailman/listinfo/internet-history
> <https://elists.isoc.org/mailman/listinfo/internet-history>
>
More information about the Internet-history
mailing list