[ih] Saving IETF history

Wed May 12 20:23:13 PDT 2021

On 13-May-21 11:22, Toerless Eckert via Internet-history wrote:
...
> Going forwarding, IMHO, the best solution for e.g.: IETF documentation would be:

This isn't an IETF mailing list, but if it was, I'd suggest finding out what the IETF Secretariat plus the RFC Editor already archive. The RFC (+IEN) archive is certainly pretty solid. As far as I know, the I-D archive has been used many times in prior art searches, including litigation. Whether mailing list archives and meeting minutes have been used too, I don't know. Certainly, some list archives prior to ietf.org hosting the lists can be hard to find.

The same question could be asked about the IRTF too.

    Brian

> 
> a) have all data such as all of datatracker and IETF mailing list archive in an easy mirrored access form,
>    which i think we do not have, at least i have not found it, only for some subset of our data.
> 
> b) Have multiple, independent of each other mirrors around the world that would create
>    signed/dated certificates for the hashes of each mirrored document - and keep old
>    (versions of) documents and their signatures even when they would be deleted/changed on the origin site.
> 
>    Maybe those mirrors cost money, but IMHO worth it. especially for stuff like IETF whose overall
>    volume on disk is laughable small. And this becam standard tooling, folks like CHM should be
>    ideal places for such mirroring.
> 
> Without something equivalent to a/b i fear it is way too easy to create fake evidence for anything,
> and the "evidence" may not hold up as well court as  the "good old printed evidence".
> 
> This "creation time" tracking in a more trustworthy fashion will of course not
> work retroactively, which is why it would be even more important to understand the value of
> doing this now, so someone starts doing it for the benefit of future bogus lawsuits for
> stuff we start working on now. Especially given how paper already has disappeared as more
> reliable evidence.
> 
> Cheers
>     Toerless
> 
> On Wed, May 12, 2021 at 03:12:43PM -0700, Karl Auerbach via Internet-history wrote:
>> I have also been highly concerned about the tendency of modern tech history
>> to erase its own records.
>>
>> My concern may, however, be in a different direction.
>>
>> I am concerned about the growth of specious patents.  There are a lot of
>> patent trolls out there who buy-up weak patents that got past the relatively
>> lax patent examiners in the US and elsewere, examiners who often have no
>> notion of ideas in networking or computer systems, whether embodied in
>> software or hardware.
>>
>> By erasing our past we make it difficult to rebut these bad patents - we
>> have discarded the evidence that the claims of those patents are neither
>> novel nor non-obvious.
>>
>> I think that over the last few years the IETF has done a spectacular job of
>> organizing and tracking the RFC series.
>>
>> However, we still have a tendency to forget the old when the newer, shinier
>> thing comes along.
>>
>> We should strive to make sure that our past is recorded.  And we ought to
>> consider legal evidentiary requirements so that one who is challenging
>> specious patents is not blocked by the complexities of the rules of
>> evidence.
>>
>> 	--karl--
>>
>>
>> On 5/9/21 1:23 AM, John Gilmore via Internet-history wrote:
>>> Dave Crocker wrote:
>>>> Saving the RFCs is obvious.  What appears to be less obvious and, IMO,
>>>> is just as important in historical terms, is /all/ of the IETF-related
>>>> work materials.  Drafts.  Mailing list archives.  Session notes.
>>>> Everything.
>>>
>>> John Day wrote:
>>>> Agreed. It should go somewhere. Same for all of the other standards
>>>> groups, forums, consortia, etc.
>>>
>>> Re the IETF, look in:
>>>
>>>    https://archive-it.org/collections/11034
>>>
>>> A few years ago, I set up an Archive-It.org job to monitor the IETF's
>>> web presence.  I was disturbed at the deliberate ephemerality of the
>>> Internet-Draft ecosystem.  I had been looking back at a 10-year-old
>>> effort to eliminate some ridiculous restrictions on the IPv4 address
>>> space, and IETF had thrown away most of the relevant documents (though I
>>> found copies elsewhere once I knew their names).
>>>
>>> Archive-It is a service of the nonprofit Internet Archive (archive.org).
>>> So, the Internet Archive's robots are now crawling (various parts of)
>>> the IETF websites every week, month, and quarter, under my direction.
>>> And saving the results forever, or as long as the Internet Archive and
>>> the Wayback Machine exist.  Between 1998 and now it's pulled in about
>>> 1.8 TB of documents, which are accessible and searchable either from the
>>> above URL, or from the main Wayback Machine at web.archive.org.
>>>
>>> The IETF websites aren't organized for archiving.  I frankly don't
>>> understand their structure, so am probably missing some important
>>> things, and overcollecting other things.  But at least I tried.
>>> Suggestions are welcome.
>>>
>>> Just be glad the IETF is copying-friendly.  Imagine trying to archive
>>> the IEEE or OSI standards development process.  Then imagine big
>>> copyright lawsuits from self-serving people who tied their income
>>> stream to restricting who can access the standards and the
>>> standardization process.
>>>
>>> 	John
>>>
>>> PS: Anyone or any institution can get an Archive-It account for roughly
>>> $10K/year.  The service automates the collecting of *anything* you want
>>> from the web for posterity.  (If you want them to, the Internet Archive
>>> will also write copies of it on new hard drives and send them to you for
>>> your own archival collection.)  About 800 institutions are customers today.
>>> You can also get a low-support low-volume Archive-It Basic account for
>>> $500/year.  Or get custom Digital Preservation services to improve the
>>> likelihood that your own curated digital assets will survive into the
>>> distant future.  See https://Archive-It.org .
>>>
>>> PPS: The Internet Archive's long term survival is, of course, not
>>> guaranteed.  In particular, it will go through a tough transition when
>>> its founder eventually dies.  What is guaranteed is that they have built
>>> a corpus of useful information: Millions of books, billions of web
>>> pages, hundreds of thousands of concerts, decades of saved television
>>> channels, etc.  They are absorbing a lot of archival microfilm, too,
>>> including genealogical and census records, magazines, etc.  This corpus
>>> will likely motivate people to preserve and replicate it into being
>>> useful in the distant future.  They have tried to design the technical
>>> storage to encourage that result.  Does anyone here know anybody who has
>>> both the money and the motivation to make a complete and ongoing copy in
>>> a separately administered, separately owned organization?  That would
>>> significantly mitigate the long term risk of having all the replicated
>>> copies of the corpus owned by a single US nonprofit.  It would probably
>>> take a bare minimum staff of 10 people to run and manage such an
>>> operation, with dozens of petabytes of rotating storage in multiple data
>>> centers and a large collection of (mostly free) software keeping it all
>>> organized and accessible.
>>>
>> -- 
>> Internet-history mailing list
>> Internet-history at elists.isoc.org
>> https://elists.isoc.org/mailman/listinfo/internet-history
>