[ih] Saving IETF history

Wed May 12 16:22:13 PDT 2021

To put Karls conerns into a maybe easier understood (but theoretical) example for those on the list that have not been involved in practical instances of the problem:

- printed public/user product documentation from 2000 gets thrown out 15 years later because
  of "we need to get rid of all this old junk", maybe because of refurnishing offices.
- half a year later, a lawsuit with such a "bogus" patent that was filed in 2002 ensues.
- Obviously, the 2000 public/user product documentation would exactly show the patent
  claim to be "bogus" because the public documentation from 2000 explains exactly the same
  thing the patent filed in 2002 claimed to be novel.
- Online web page of the prior art product of course did not keep old version information reaching
  that far back, and even if it would have, it would not have date information on it, but only
  version numbers.

These type of things easily happen in multi-million dollar lawsuits over and over.

Going forwarding, IMHO, the best solution for e.g.: IETF documentation would be:

a) have all data such as all of datatracker and IETF mailing list archive in an easy mirrored access form,
   which i think we do not have, at least i have not found it, only for some subset of our data.

b) Have multiple, independent of each other mirrors around the world that would create
   signed/dated certificates for the hashes of each mirrored document - and keep old
   (versions of) documents and their signatures even when they would be deleted/changed on the origin site.

   Maybe those mirrors cost money, but IMHO worth it. especially for stuff like IETF whose overall
   volume on disk is laughable small. And this becam standard tooling, folks like CHM should be
   ideal places for such mirroring.

Without something equivalent to a/b i fear it is way too easy to create fake evidence for anything,
and the "evidence" may not hold up as well court as  the "good old printed evidence".

This "creation time" tracking in a more trustworthy fashion will of course not
work retroactively, which is why it would be even more important to understand the value of
doing this now, so someone starts doing it for the benefit of future bogus lawsuits for
stuff we start working on now. Especially given how paper already has disappeared as more
reliable evidence.

Cheers
    Toerless

On Wed, May 12, 2021 at 03:12:43PM -0700, Karl Auerbach via Internet-history wrote:
> I have also been highly concerned about the tendency of modern tech history
> to erase its own records.
> 
> My concern may, however, be in a different direction.
> 
> I am concerned about the growth of specious patents.  There are a lot of
> patent trolls out there who buy-up weak patents that got past the relatively
> lax patent examiners in the US and elsewere, examiners who often have no
> notion of ideas in networking or computer systems, whether embodied in
> software or hardware.
> 
> By erasing our past we make it difficult to rebut these bad patents - we
> have discarded the evidence that the claims of those patents are neither
> novel nor non-obvious.
> 
> I think that over the last few years the IETF has done a spectacular job of
> organizing and tracking the RFC series.
> 
> However, we still have a tendency to forget the old when the newer, shinier
> thing comes along.
> 
> We should strive to make sure that our past is recorded.  And we ought to
> consider legal evidentiary requirements so that one who is challenging
> specious patents is not blocked by the complexities of the rules of
> evidence.
> 
> 	--karl--
> 
> 
> On 5/9/21 1:23 AM, John Gilmore via Internet-history wrote:
> > Dave Crocker wrote:
> > > Saving the RFCs is obvious.  What appears to be less obvious and, IMO,
> > > is just as important in historical terms, is /all/ of the IETF-related
> > > work materials.  Drafts.  Mailing list archives.  Session notes.
> > > Everything.
> > 
> > John Day wrote:
> > > Agreed. It should go somewhere. Same for all of the other standards
> > > groups, forums, consortia, etc.
> > 
> > Re the IETF, look in:
> > 
> >    https://archive-it.org/collections/11034
> > 
> > A few years ago, I set up an Archive-It.org job to monitor the IETF's
> > web presence.  I was disturbed at the deliberate ephemerality of the
> > Internet-Draft ecosystem.  I had been looking back at a 10-year-old
> > effort to eliminate some ridiculous restrictions on the IPv4 address
> > space, and IETF had thrown away most of the relevant documents (though I
> > found copies elsewhere once I knew their names).
> > 
> > Archive-It is a service of the nonprofit Internet Archive (archive.org).
> > So, the Internet Archive's robots are now crawling (various parts of)
> > the IETF websites every week, month, and quarter, under my direction.
> > And saving the results forever, or as long as the Internet Archive and
> > the Wayback Machine exist.  Between 1998 and now it's pulled in about
> > 1.8 TB of documents, which are accessible and searchable either from the
> > above URL, or from the main Wayback Machine at web.archive.org.
> > 
> > The IETF websites aren't organized for archiving.  I frankly don't
> > understand their structure, so am probably missing some important
> > things, and overcollecting other things.  But at least I tried.
> > Suggestions are welcome.
> > 
> > Just be glad the IETF is copying-friendly.  Imagine trying to archive
> > the IEEE or OSI standards development process.  Then imagine big
> > copyright lawsuits from self-serving people who tied their income
> > stream to restricting who can access the standards and the
> > standardization process.
> > 
> > 	John
> > 
> > PS: Anyone or any institution can get an Archive-It account for roughly
> > $10K/year.  The service automates the collecting of *anything* you want
> > from the web for posterity.  (If you want them to, the Internet Archive
> > will also write copies of it on new hard drives and send them to you for
> > your own archival collection.)  About 800 institutions are customers today.
> > You can also get a low-support low-volume Archive-It Basic account for
> > $500/year.  Or get custom Digital Preservation services to improve the
> > likelihood that your own curated digital assets will survive into the
> > distant future.  See https://Archive-It.org .
> > 
> > PPS: The Internet Archive's long term survival is, of course, not
> > guaranteed.  In particular, it will go through a tough transition when
> > its founder eventually dies.  What is guaranteed is that they have built
> > a corpus of useful information: Millions of books, billions of web
> > pages, hundreds of thousands of concerts, decades of saved television
> > channels, etc.  They are absorbing a lot of archival microfilm, too,
> > including genealogical and census records, magazines, etc.  This corpus
> > will likely motivate people to preserve and replicate it into being
> > useful in the distant future.  They have tried to design the technical
> > storage to encourage that result.  Does anyone here know anybody who has
> > both the money and the motivation to make a complete and ongoing copy in
> > a separately administered, separately owned organization?  That would
> > significantly mitigate the long term risk of having all the replicated
> > copies of the corpus owned by a single US nonprofit.  It would probably
> > take a bare minimum staff of 10 people to run and manage such an
> > operation, with dozens of petabytes of rotating storage in multiple data
> > centers and a large collection of (mostly free) software keeping it all
> > organized and accessible.
> > 
> -- 
> Internet-history mailing list
> Internet-history at elists.isoc.org
> https://elists.isoc.org/mailman/listinfo/internet-history