For the half-year to 31 December 2014, the IPKat's regular team is supplemented by contributions from guest bloggers Rebecca Gulbul, Lucas Michels and Marie-Andrée Weiss.

Regular round-ups of the previous week's blogposts are kindly compiled by Alberto Bellan.

Wednesday, 4 July 2007

Ticking bomb in National Archive defused by Microsoft

The IPKat's friend Martin Cohen has drawn his attention to this item from the BBC on the problems relating to the accessing of old digital file formats, which Natalie Ceeney (chief executive of the UK National Archives) has described as a "ticking time bomb". Ms Ceeney warns that we face the possibility of "losing years of critical knowledge" because modern PCs can't always open old file formats. National Archives has teamed up with Microsoft in the quest for a technical solution. The National Archives has more than 580 terabytes of data - that's even bigger than the collected tales of Harry Potter - in older file formats that are no longer commercially available. Ms Ceeney adds:

"If you put paper on shelves, it's pretty certain it is going to be there in a hundred years. If you stored something on a floppy disc just three or four years ago, you'd have a hard time finding a modern computer capable of opening it. Digital information is in fact inherently far more ephemeral than paper".
The root cause of the problem is the range of proprietorial file formats that proliferated during the early digital revolution. Technology companies used file formats that were not only incompatible with the software of their rival, but also as between different versions of the same program. Microsoft has developed a new document file format, Open XML, for saving files from programs such Word, Excel and Powerpoint. According to Microsoft spokesman Gordon Frazer, it's an open international standard, under independent control. This new standard appears however to be in competition with a rival, Oasis Open Document Format. Martin Cohen comments:

"The article fails to make mention that there are also legal ramifications raised by these activities, in terms of allowing libraries to make archive copies during the copyright period of the works in order to ensure that the material does not further deteriorate, or that they still have access to old hardware. These points are reflected in Recommendations 9, 10a and 10b of the Gowers Review".
The IPKat thinks this problem should have been anticipated: he had the same sort of problems when computers stopped using the old punch-card readers and went on to more sophisticated forms of software, which meant that lots of old data was effectively lost unless someone could dig up a machine primitive enough to read it. Merpel says, this news item looks like a great argument in favour of making paper back-ups ...

How to make a bomb from paper here

1 comment:

B said...

It's hard to glean the order of events from the write-up, but the Open Office Document Format was approved by OASIS about a year or two before Microsoft's Office Open Document Format was approved by the ECMA fast-track standards approval process. [Lesson one: Open Office should have gotten a trademark while it had the chance.]

As indicated by the names, Microsoft's format is a direct response to OpenOffice.org's format. After a few governments (notable among them the State of Massachusetts) threatened to give up on Microsoft over its lack of open, archivable formats, it got in gear and got its format approved as a standard.

However, the standard is not a new, open specification, but a partial documenting of the current Microsoft formats. It is several thousand pages, deliberately encodes several glitches from past versions of Word and Excel to maintain compatibility, and even has parts here and there that say things like `the functionality of this feature is hard to describe, so get a copy of Word and just try some things.'

In short, regardless of whether it is (partially) approved by ECMA and maintained by an independent body, it is not a standard that anybody but Microsoft will ever be able to fully implement, and still ties the user to Microsoft products. Whatever the National Archives may be advertising, in the end they are just saving all their documents to Word 2007 format.

Subscribe to the IPKat's posts by email here

Just pop your email address into the box and click 'Subscribe':