Technology Review: Data Extinction

On the plane this morning got caught up on some magazine reading. I absolutely love MIT’s Technology Review – in its latest incarnation, it focuses on all things relating to innovation. The result is a magazine that is full of useful and intriguing information.

This month’s cover story is on data extinction (available to subscribers only) – the challenge of preserving access to data as systems, applicaions and operating systems evolve. Some revealing statistics:

  • volume of business-related e-mail will rise from 2.6 trillion messages in 2001 to 5.9 trillion messages in 2005 (source: IDC)
  • JPEG is becoming outmoded by JPEG 2000; result: in five years it may be difficult for you to view photos taken today with digital cameras.
  • Land use and natural resource inventories for the state of New York in the late 60s are no longer accessible – the customized software that produced the inventories no longer exists.
  • NASA satellite data from the 70s is completely unreadable today.

Three different approaches to solving this problem:

  • Migration: convert current data to future formats. Difficult and unwieldy especially when the volume of data grows. Not scalable, and inevitably some data is lost or modified in some way that may not be immediately apparent.
  • Emulation: create an emulator to mimic the hardware/software environment that the data was designed for. Not really a long-term solution, because it puts off a comprehensive solution, and could result in unwieldy chains of emulators – which can become a house of cards.
  • Encapsulation: a “way to group digital objects together with descriptive ‘wrappers’ containing instructions for decoding their bits in the future.”

Then there’s the long-shot, a “universal virtual computer” – which would simulate the basic functions of a computer, create a basic architecture (memory, registers, rules for exchanging data) and define it in a way that any application on any platform would be able to store two versions of a file – one in the proprietary format and one in the UVC format. The article includes some support for this concept, and early tests indicate it may be possible. The result would be that future applications would have access to data created today.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.