NY Times Article on the realities and costs of Born Digital preservation
Mar 16th, 2010 by Isaiah Beard

Salman Rushdie. Source: Wikipedia. Click on image for link to source.

The New York Times today published an article that reflects some of the challenges of preserving born digital content – that is, documents, data and other content that has been created digitally, on a computer or electronic device, and for which there is no physical original (such as on paper).

In particular, they highlight the efforts of Emory University, in preserving Salman Rushdie’s archival materials.

Among the archival material from Salman Rushdie currently on display at Emory University in Atlanta are inked book covers, handwritten journals and four Apple computers (one ruined by a spilled Coke). The 18 gigabytes of data they contain seemed to promise future biographers and literary scholars a digital wonderland: comprehensive, organized and searchable files, quickly accessible with a few clicks.

But like most Rushdian paradises, this digital idyll has its own set of problems. As research libraries and archives are discovering, “born-digital” materials — those initially created in electronic form — are much more complicated and costly to preserve than anticipated.

Electronically produced drafts, correspondence and editorial comments, sweated over by contemporary poets, novelists and nonfiction authors, are ultimately just a series of digits — 0’s and 1’s — written on floppy disks, CDs and hard drives, all of which degrade much faster than old-fashioned acid-free paper. Even if those storage media do survive, the relentless march of technology can mean that the older equipment and software that can make sense of all those 0’s and 1’s simply don’t exist anymore.

Imagine having a record but no record player.

An interesting aspect of this collection and its exhibition is that it emulates the experience Rushdie had in creating the content.  Rather than just viewing the finished documents, you get to see the computer desktop as he saw it, open up the same applications he used, all in the 1980s and 1990s technological contexts… and not using the modern, Web 2.0, Windows 7 or Mac OS X trappings we’re accustomed to in today’s computers.

I think this article is an excellent read, irrespective of what one’s views may be on the subject matter.  Material of all kinds, in increasing amounts, faces the same perils as this collection every day, and archivists everywhere, including this one, wrestle with how best to retain it all.  So far, the only tried and true method for such types of preservation is to obsessively manage and migrate the content, and that requires making tough decisions as to how to proceed, what formats to migrate to, and hoping the decisions made are the right ones to keep the content viable, at least until the next generation of technology requires that the hard decisions be made again.

Reel2Bytes: Digitizing 1950s-era analog tape
Feb 23rd, 2010 by Isaiah Beard

Of all the work I do, I think dealing with older formats, and just figuring out how they work, is the most interesting aspect.

A few weeks ago, a stack of old open real tapes arrived, along with a similar-vintage tape player.  The recordings were done in the early 1950s, as part of a project to record the oral histories of various labor officials who were active in the early 20th century.  The recordings made it unequivocally clear that the intent was to allow students and researchers from decades into the future to get insight on the history of the labor movement in the state.

Well, for quite a few years, these tapes remained shelved and seldom accessed, until a faculty member from the School of Management and Labor Relations learned of their existence and wanted to use them in his courses.  Owing to the age of the recording format, the scarcity of playback equipment, and the condition of the tapes, there is no way that multiple students would practically access the tapes and have them survive.  But, that doesn’t mean the content should stay inaccessible.

And so, after getting a demonstration from out Special Collections staff on the best way to handle the tapes, and after mustering the courage to risk handling them, the player was hooked up to more modern digital recording equipment, and the digitization had begun:

I’ve always heard people talk about what wonderful sound fidelity the old open reel tape formats had, and they’re right; the sound quality is great, particularly for 55+ year old recordings. The physical condition of the tapes left much to be desired though: one reel had a paper backing, and was extremely fragile. Just playing it back was a white-knuckle experience. It’s a shame too, because one thing you do miss in the migration of old content to digital formats is the experience of handling these old things, and getting them working again. The operation of the tape deck; threading the tape, feeling the very mechanical-ness of the format and how it worked… these are things that modern digital formats have yet been unable to duplicate or preserve.

Additional photos of the setup and the reels themselves appear below the cut.
Read the rest of this entry »

New Scientist article on “Digital Doomsday”
Feb 3rd, 2010 by Isaiah Beard

One of the topics I like to bring up in the discussion of preserving digital data is the idea of a Digital Dark Age… the notion of a period in our historic knowledge that ends up getting lost due to a failure to plan and preserve our early digital content.

The New Scientist, however, recently published an article (Feb 2, 2010) on something a bit more cataclismic: the concept of  Digital Doomsday.  From the article:

Suppose, for instance, that the global financial system collapses, or a new virus kills most of the world’s population, or a solar storm destroys the power grid in North America. Or suppose there is a slow decline as soaring energy costs and worsening environmental disasters take their toll. The increasing complexity and interdependency of society is making civilisation ever morevulnerable to such events (New Scientist, 5 April 2008, p 28 and p 32).

Whatever the cause, if the power was cut off to the banks of computers that now store much of humanity’s knowledge, and people stopped looking after them and the buildings housing them, and factories ceased to churn out new chips and drives, how long would all our knowledge survive? How much would the survivors of such a disaster be able to retrieve decades or centuries hence?

The article is a compelling read, and offers an intellectual exercise on how much of our “stuff” will survive such a castastrophe.  Ironically, the logic is that the digital content with the most copies oin existence may win out.  So, while scholarly works, theses, research and other important scientific data would be at risk, pop music may surive just fine.

The case for improved large file support in digital repositories
Nov 2nd, 2009 by Isaiah Beard

As the person responsible for handling the various file formats in RUcore, the digital library repository for Rutgers University Libraries, I’ve been looking with trepidation at the increasing sizes of the digital assets people are starting to create.  In 2004 when the architecture for this was first envisioned, very few digital items grew past the hundred-megabyte point.

How things have changed!  Video and even audio files are routinely pushing into the gigabytes, now that technology has progressed to the point where high-definiteion video and audio can be originated for ubiquitous mobile devices.  And as RUcore and other large repositories seek to preserve this content, we are finding ourselves running into a hurdle we did not anticipate: the ability for our architectures to handle these very large digital files.  In particular, files larger than 2 Gigabytes has posed some exceptions forFEDORA, our infrastructure of choice, and this is a very big deal for video content in particular.  Consider that 2 Gigabytes can comprise less than 5 minutes of HD content, and you can see our dilemna.

Added mechanisms to support these large items has been slow in coming, and have presented some difficulties of their own in implementing.  For this reason, I’ve drafted a document which explains our position on why we need uniform large file support in digital repositories.  Feel free to have a look and provide feedback.

With any luck, developers will heed the call presented here and in other institutions,a nd work to make better support for big files a reality.

Google’s Neglected Archive
Oct 8th, 2009 by Isaiah Beard

Controversial as it may be to archivists of both the digital and analog realms, Google is often seen as the ubiquitous oracle of record by the general online population.  Websites (including libraries) strive to be listed on its search engines; it runs the de facto video archive for the internet at large; it’s made searching patents and even scholarly material easier to access than their respective online custodians in many cases.  Research institutions and businesses even now turn to it to handle their growing mounds of e-mail and electronic documents.

Yet, curators and librarians have been very skeptical of Google, its aims, and even its competency at truly being able to objectively preserve the massive digital content it aims to take on.  The ongoing Google Books drama is one such aspect of this.  But now, others outside of the world of preservationists are seriously calling Google’s competency and motives into question.

Wired.com has posted an article about what may well be an example of Google’s mismanaging of important archives.  Usenet was probably the Internet’s earliest online reference, social network, and historical archive, consisting of terabytes of text articles, files and writings from internet users (and its predecessors) dating back to 1980.  While most of the content can be quite mundane, some of these articles document milestones in online history or contain the writings of influential pioneers of the Internet.  Google, through its aquisitions of various entities over the years, became the world’s de facto curator of this content, and according to Wired contributor Kevin Paulson,Google is failing in this role:

… visiting Google Groups is like touring ancient ruins.

On the surface, it looks as clean and shiny as every other Google service, which makes its rotting interior all the more jarring — like visiting Disneyland and finding broken windows and graffiti on Main Street USA.

Searching within a newsgroup, even one with thousands of posts, produces no results at all. Confining a search to a range of dates also fails silently, bulldozing the most obvious path to exploring an archive.

Want to find Marc Andreessen’s historic March 14, 1993 announcement in alt.hypertext of the Mosaic web browser? “Your search – mosaic – did not match any documents.”

Wired’s send-up pretty much hints towards the exact concern that many archivists have about Google’s aims: that they aren’t motivated to maintain assets they hold if those assets are no longer making them any significant money.  With newer, more visually-appealing technologies largely supplanting the once-huge popularity of Usenet, its archive is not as hot a commodity as it used to be, and one can speculate that if it’s not generating enough ad revenue, Google isn’t going to care to maintain the archive or keep it functional.

So, what happens when certain subject matter in their scanned books archive becomes less-popular – and thus less visited – within the Internet’s incredibly short attention span?


SIDEBAR
»
S
I
D
E
B
A
R
«
»  Substance:WordPress   »  Rights: Creative Commons License
AWSOM Powered