Sunday, December 23, 2007

The New York Times article linked below discusses the problems and costs of maintaining digital works of art (cinema) over a long period of time. What this article helps expose is the fundamental problem facing all digital information generating entities: long term preservation and accessibility. What is subsumed within this question is how to ensure long term data provenance, or provably persistent data integrity.

This article actually sounds a clarion call for those who choose to archive anything digital. It also exposes issues not typically addressed by those selling recordable media or long-term storage or archive solutions.

First, "life-long storage" does not mean "long-life storage" --- the difference is crucial, and I believe generally ignored. Life-long storage of degrading media is really storage of media only, and not of the information contained therein. The "lifetime" of most media is described in terms providing no meaningful guidance (such as "mean time between failure" for hard drives). CD, DVD's do not typically advertise or disclose their stated "shelf-life." I remember purchasing some "gold" CD's that were hawked as "long life" only to see the glitter of gold dust exfoliated by these CD's with 5 years of their purchase. Others appeared intact, but became latter-day coasters. Newer storage media: I understand that "flash drives" just wear out over time. What that portends for the SD, Memory Sticks, thumb drives, and pc-memory storage components containing important information (oops, you store your key recovery data on that USB drive...") is not good. Nano-technology? No word yet as to longevity.

Ok, so that means that the 2007 Quadrennial 5-DVD Digitally Re-mastered Dolby 11.1 Enhanced Autographed, Numbered and Limited Edition Blue Ray/HDNA Directors' (and all other) Cut of "Blade Runner" for which I shell out 100 bucks may, or may not be playable five years after I take the DVD's out of the climate controlled, UV free dark room. I'll take my chances, but if they "decompose" I'll be sad. My sadness will be limited, however, only to the extent of my loss of a favorite flick (plus 100 bucks). Not so with enterprise or government information intended to last anywhere between a long time and in perpetuity.

What this means is that entities facing lengthy retention periods had better start considering developing and deploying methods for periodic and orderly migration to "fresher" media, or risk losing data. Including those redundant backups. This brings up the question of whether such entities are required to acknowledge and take steps now, or if these entities will be given a "pass" because they merely did what at the time appeared to provide compliance.

So, it appears that we may be leaving the survival of our most important information (digitally sourced) to "Digital Darwinism," in which information contained in only the "strongest" or most robustly maintained (and migrated) media, survive.

The other issue, in which I profess a biased interest, is how to ascertain and prove, through time, the integrity of such digital data that survives.

Maybe we should worry about that issue in 10 years. It looks as if we'll have a hard enough time proving provenance even for relatively short-lived data.

