[ih] Link rot (was: Museum archiving (was: Re: IENs))

Jack Haverty jack at 3kitty.org
Sat May 8 14:24:48 PDT 2021


Sometimes you can find lost URLs' contents by looking at archive.org if
you can remember the relevant URL.

At one point in the '90s while I was on the W3C I lobbied for the
introduction of a new form of URL - a "PURL" or permanent URL.   PURLs
would have their content cached in a permanent database (akin to
archive.org), so that if/when the URL ever disappeared the last content
would still be available.  Someone creating web content who wanted it to
have longevity would submit it as a PURL to that database, which would
copy its contents and periodically check for changes.   The backend
storage would be maintained and managed by W3C.

Looking now from 2021, archive.org does provide a very similar service
although I'm not sure how comprehensive it is.  Also, it's not clear
whether or not popular search engines would find something that has
disappeared as a URL, but is still present in the archive.org
repository.  And like most other repositories, the survival of
archive.org depends on some stream of continuous life support (funding,
etc.)

I've occasionally caught some corporation's misdeeds by such
retrievals.  E.g., when a manufacturer promises that a product will do
something, it's no longer sufficient for them to just change the
product's web page to erase all traces of the promise.  Chances are it's
in the archive still and they have a tough time claiming they never said
that.

/Jack

On 5/8/21 1:45 PM, John Day via Internet-history wrote:
> Couldn’t agree more.  A URL as a citation is practically useless. The Internet is not much of an archive.
>
>> On May 8, 2021, at 16:17, Brian E Carpenter <brian.e.carpenter at gmail.com> wrote:
>>
>> On 09-May-21 02:44, Ole Jacobsen via Internet-history wrote:
>> ...
>>> I'll just note that there used to be a direct URL for ConneXions in the CBI
>>> hosted publications archive, but that has recently changed. Another peril
>>> of online museums. 
>> Indeed, and I wonder whether this august body could somehow try to change the thinking of archivists about that problem. Just over the last year or so, I've been digging in archives quite a bit (for  doi.org/10.1109/MAHC.2020.2990647 and a forthcoming follow-up) and even in that time, some URLs have rotted, which makes it annoying to go back and follow up a new detail, and invalidates published citations. Over the longer term, say 10 years, even more links rot and search results become misleading or useless.
>>
>> (Or maybe that's a topic for the SIGCIS list.)
>>
>> Regards
>>   Brian Carpenter




More information about the Internet-history mailing list