New Scientist article on “Digital Doomsday”
Feb 3rd, 2010 by Isaiah Beard

One of the topics I like to bring up in the discussion of preserving digital data is the idea of a Digital Dark Age… the notion of a period in our historic knowledge that ends up getting lost due to a failure to plan and preserve our early digital content.

The New Scientist, however, recently published an article (Feb 2, 2010) on something a bit more cataclismic: the concept of  Digital Doomsday.  From the article:

Suppose, for instance, that the global financial system collapses, or a new virus kills most of the world’s population, or a solar storm destroys the power grid in North America. Or suppose there is a slow decline as soaring energy costs and worsening environmental disasters take their toll. The increasing complexity and interdependency of society is making civilisation ever morevulnerable to such events (New Scientist, 5 April 2008, p 28 and p 32).

Whatever the cause, if the power was cut off to the banks of computers that now store much of humanity’s knowledge, and people stopped looking after them and the buildings housing them, and factories ceased to churn out new chips and drives, how long would all our knowledge survive? How much would the survivors of such a disaster be able to retrieve decades or centuries hence?

The article is a compelling read, and offers an intellectual exercise on how much of our “stuff” will survive such a castastrophe.  Ironically, the logic is that the digital content with the most copies oin existence may win out.  So, while scholarly works, theses, research and other important scientific data would be at risk, pop music may surive just fine.

Google’s Neglected Archive
Oct 8th, 2009 by Isaiah Beard

Controversial as it may be to archivists of both the digital and analog realms, Google is often seen as the ubiquitous oracle of record by the general online population.  Websites (including libraries) strive to be listed on its search engines; it runs the de facto video archive for the internet at large; it’s made searching patents and even scholarly material easier to access than their respective online custodians in many cases.  Research institutions and businesses even now turn to it to handle their growing mounds of e-mail and electronic documents.

Yet, curators and librarians have been very skeptical of Google, its aims, and even its competency at truly being able to objectively preserve the massive digital content it aims to take on.  The ongoing Google Books drama is one such aspect of this.  But now, others outside of the world of preservationists are seriously calling Google’s competency and motives into question.

Wired.com has posted an article about what may well be an example of Google’s mismanaging of important archives.  Usenet was probably the Internet’s earliest online reference, social network, and historical archive, consisting of terabytes of text articles, files and writings from internet users (and its predecessors) dating back to 1980.  While most of the content can be quite mundane, some of these articles document milestones in online history or contain the writings of influential pioneers of the Internet.  Google, through its aquisitions of various entities over the years, became the world’s de facto curator of this content, and according to Wired contributor Kevin Paulson,Google is failing in this role:

… visiting Google Groups is like touring ancient ruins.

On the surface, it looks as clean and shiny as every other Google service, which makes its rotting interior all the more jarring — like visiting Disneyland and finding broken windows and graffiti on Main Street USA.

Searching within a newsgroup, even one with thousands of posts, produces no results at all. Confining a search to a range of dates also fails silently, bulldozing the most obvious path to exploring an archive.

Want to find Marc Andreessen’s historic March 14, 1993 announcement in alt.hypertext of the Mosaic web browser? “Your search – mosaic – did not match any documents.”

Wired’s send-up pretty much hints towards the exact concern that many archivists have about Google’s aims: that they aren’t motivated to maintain assets they hold if those assets are no longer making them any significant money.  With newer, more visually-appealing technologies largely supplanting the once-huge popularity of Usenet, its archive is not as hot a commodity as it used to be, and one can speculate that if it’s not generating enough ad revenue, Google isn’t going to care to maintain the archive or keep it functional.

So, what happens when certain subject matter in their scanned books archive becomes less-popular – and thus less visited – within the Internet’s incredibly short attention span?

Original NASA moon landing tapes: probably gone for good
Jul 16th, 2009 by Isaiah Beard

40 years ago, the Apollo 11 mission blasted off into space, making history as the first successful human landing on the moon.  Unfortunately, NPR reports that after much searching, it looks like the best possible copies of the video that recorded this momentous event have long been erased:

Over the years, NASA had removed massive numbers of magnetic tapes from the shelves. In the early 1980s alone, tens of thousands of boxes were withdrawn.

It turns out that new satellites had gone up and were producing a lot of data that needed to be recorded. “These satellites were suddenly using tapes seven days a week, 24 hours a day,” says Lebar.

And the agency was experiencing a critical shortage of magnetic tapes. So NASA started erasing old ones and reusing them.

That’s probably what happened to the original footage from the moon that the astronauts captured with their lunar camera, says Lebar. It was stored on telemetry tapes, and old tapes with telemetry data were being recycled.

The article also explains how the specially-designed video cameras that astronauts took the moon produced videos of much higher quality than the snowy, blurry video American households saw that night, and we’ve seen for many years since.  Regrettably, the Apollo video cameras used a non-standard format, requiring machinations on the ground to both store the content and convert it to more conventional means (and thus, introducing the noise and blur on currently available tapes).

And so, NASA becomes a poster child not only for the pitfalls of poor preservation planning, but the perils of using non-standard, proprietary formats to record important, historic moments!

Preserving digital photos: What not to do
Apr 6th, 2009 by Isaiah Beard

camera disassembled

One of the more frequent debates that I see cropping up often in preservation circles is how best to preserve “born digital” photographs: those photos that never began as physical film, but originated on a digital camera.

This isn’t an easy topic. There is no industry standard for born digital image preservation. Digital cameras of different vintages and configurations will output in one of a handful of differing file formats, and their metadata will often differ as well. And so, preservationists have been largely left to their own devices, fabricating their own methods, preferred formats and storage procedures for handling this type of material.

One controversial method that has been suggested is to forget about digital altogether, and to use a pigment-based inkjet or die-sub printer to print physical copies of digital photographs and rely on the hard copies as the long-term archive. This is a tempting method for lots of curators who have been trained to trust the physical, and without delving too deep into the specifics this seems at first blush like sound reasoning.

Unfortunately, it can be a very bad idea, and here’s why.

Loss of image fidelity

This is by far the most important reason, and yet not really the most obvious to some. For laypeople, and for the less-experienced in digital formats, creating a print from a digital files is a lot like doing the same from analog film. However, inkjet and photo printers are not going to give you the same level of quality as a true analog photographic print. And the print, while fine to the naked eye, will suffer a significant degradation compared to the original.

The best way to prove this is to take a digital image, make a print, and then rescan it. Here, for instance, is a born digital image taken from a Canon EOS 30D, shot and preserved in Camera RAW format, and presented here as a 24-bit PNG file:

Primary Image in PNG
(Note: clicking on the above image will take you to the full-resolution photograph, a 16MB file.)

I printed this image on a Kodak Photo Printer, using pigment inks, on 4×6 Kodak photo paper. Then, I rescanned the image at 1200dpi, using the scanner attached to the same photo printer. Here’s the resulting re-scan:

Rescan
(Note: clicking on the above image will take you to the full-resolution re-scanned photograph, also a 16MB file.

At these reduced resolutions, there doesn’t seem to be much difference. The color appears slightly off, but it isn’t so bad… right? Well, let’s look a little closer at the re-scan:
Rescan closeup

Yikes! Clearly, there’s a significant compromise in image quality here, and this is because photo printers, regardless of how good they are, rely on printing methods that are unlike the traditional photograph, and through which the same level of quality doesn’t translate if you’re doing a bit-per-bit scan. This becomes even more evident when you compare the re-scan with the digital master, at the same scale.

If this argument isn’t compelling enough, there are other reasons for not relying on a hard copy as your preservation master.

Loss of technical metadata

Most modern digital cameras embed technical metadata into their image files, either by using EXIF, or as built in fields into their own Camera Raw format. This information can contain information about the camera which too the photo, what settings were used, what lenses, time and date, and even the GPS location of the camera, i properly equipped. It goes without saying that all of this potentially valuable metadata is lost if a hard copy is used as a preservation master, in lieu of the digital.

Limited ability to adjust or enhance the image.

Having and preserving the original file created by a digital camera affords a curator, editor or researcher a great deal of leeway and making adjustments to derivative presentation copies. Things like localized color adjustments are very easy to do with the digital master present, particularly if the master is a Camera Raw. On the other hand, your options are very limited if all you have is a print.

The best practice: preserve the digital

The best option for preserving born-digital photos remains keeping them digital. This does have implications for curators wanting to do right by their collections, and it can make the uninitiated very anxious. Capital purchases for technology, backups, and whole new workflows and best practices must be established. Fortunately, the world of digital curation is starting to come into its own, and others have already begun to tread these waters. In future articles, I will outline some best practices and case studies I’ve undertaken and encountered, to help guide those seeking answers to the digital dilemma.


SIDEBAR
»
S
I
D
E
B
A
R
«
»  Substance:WordPress   »  Rights: Creative Commons License
AWSOM Powered