YouTube as a de facto cultural archive for past videos
Nov 30th, 2009 by Isaiah Beard

YouTube is definitely the wild-west for video content, and if you look hard enough, you’ll find a lot of vintage video content.  Quite a few of it is uploaded by people whose rights to publish the stuff is questionable, but while it’s there and before it gets taken down, it’s pretty fascinating to come across old commercials, television station idents, and little vignettes that exemplify past culture.

Here’s a good example:  a 1971 Woolworth TV ad, promoting a sale on LP albums and even some 8-tracks:

An ad from a defunt store depicting a obsolete formats.  Fascinating.

Designing and Implementing a Center for Digital Curation Research
Nov 17th, 2009 by Isaiah Beard

The facility I work in at Rutgers, known as the Scholarly Communication Center (SCC), has a fairly short history in the grand scheme of academia, and yet a fairly long one when it comes to the rapid changes in technology it has seen in its lifetime.  It was originally started in the 1996, and meant to be a location for university students and faculty to access a growing body of the then-nascent collection of digital content.

Back then, the internet still wasn’t very fast and wasn’t nearly as media-rich as it it seems today.  And so, most of the data-heavy reference materials arriving in digital form came to the SCC as CD-ROMs (and later, DVD format).  To accommodate this, the SCC had a lab of ten desktop computers (known as the Datacenter), dedicated solely to accessing this type of material.

But the times changed, and so did the way people accessed digital material.  As the ‘net grew in size and capacity, it no longer made sense to ship reference material on disc, and so the access moved online.  Students migrated from visiting computer labs to bringing their own laptops (and later, netbooks and handheld mobile devices).  Traffic at the datacenter dropped to virtually nothing.  The space had to be re-tooled to continue to be relevant and useful.

And so, with my taking on the newly-minted role of Digital Data Curator, and in collaboration with my colleagues, a new plan for the former datacenter was developed.  Instead of being a place to merely access content, we would be a place to create it.  Analog items that needed to be digitized would be assessed and handled here.  New born-digital content would be edited, packaged, and prepared for permanent digital archiving in our repository.  We would be a laboratory where students getting into the field – and even faculty and staff who have been here a good while – would learn, hands-on, how to triage and care for items of historical significance, both digital and analog, and prepare them for online access.

The concept for a new facility was born.  And we call it the Digital Curation Research Center.

The center is still in “beta,” as we plug along with some internal projects for testing purposes along with a couple of willing test subjects within the university and surrounding community.  This is so we can test out the workflow of the space and make tweaks and optimizations as needed.  Our plan is to officially launch the space in the Spring of 2010, with a series of workshops and how-to sessions for the various things that make digital curation vital (e.g. digital photography, video editing, audio and podcasting, and scanning).

The plan is that this will be a continual, evolving learning experience for all involved.  People who have never really used cameras and recording equipment in a historical context will learn just how increasingly valuable the content they create, and the stories it will tell, can become over time.  And those of us in the DCRC day in and day out will encounter things that we’ve never run into before, and will have to wrap our heads around the issue of preserving it effectively.

Below are related documents that provide additional information about the DCRC.  More information will be coming up as we get closer to the official launch:

DVDs to last for millennia? Perhaps, but at a cost.
Nov 13th, 2009 by Isaiah Beard

As I’ve mentioned in some of my previous articles, I’m incredibly skeptical of the long-term success of any digital storage attempt that relies on a single medium as its container. In particular, I find CDs and DVDs to be extremely suspect in terms of their longevity. Those of us who have been around the block with these technologies know that Gold CDs and DVDs were supposed to be the last word on digital preservation, lasting decades if not centuries. But mere years later, we all learned a new term to add to our archival vocabulary: Disc Rot.  It became very clear to all of us that not all discs – not even the gold ones – are made equal, and some even from “trusted” brands can delaminate, fade, have their media layers flake off, or otherwise deteriorate and become unreadable in rather short periods of time… sometimes with painfully devastating results for smaller archives who banked it all on CDs and DVDs.

As a result, a type of storage medium that we were told would last decades has been widely recognized as only being good for about two to five years.

So clearly, it’s not without some deep suspicion and trepidation that I view a recent announcement by an unknown start-up known as Cranberry, in which they claim to solve the Disc Rot problem, and has introduced a type of DVD that they say will last 1,000 years.

A Cranberry DiamonDisc is a DVD made of high tech stone.
Memories carved on a DiamonDisc will last as long as the pyramids. No reflective surface. No ink layer. No fading. Problem solved. The Library of Congress is studying our technology for storage of the national archives. It’s the only solution for permanent, digital storage.

Oh, really?

This is a pretty bold claim from a company that no one has heard from until recently.  But what hard evidence does Cranberry have to – forgive the pun – back it up with?

Unfortunately, you have to dig pretty deeply into their site, to a FAQ section way down at the bottom before you get to a far more realistic statement:

How can you prove that the Cranberry Disc will last for centuries?
No one can prove that anything will last for centuries, but there are international standards for estimating the archival lifetime of optical media. The ECMA‐379 (2nd edition, December 2008) which tests the effects of temperature and relative humidity is widely recognized. Researchers at Millenniata have tested the Cranberry Disc using the ECMA‐379 temperature and humidity (85°C / 85% RH) testing as a standard to develop the most rigorous testing possible. They have combined temperature and humidity (85°C / 85% RH) tests with exposure to the full‐spectrum of natural light. The Cranberry Disc is the only survivor after this rigorous testing. Considering the combination of the Cranberry Disc’s test results and its rock‐like data layer, it is reasonable to conclude that the Cranberry Disc has a greater longevity and durability than competitors who claim a 300‐year shelf life.

Here’s what worries me about the above statement: Millenniata does not appear to be a dedicated research firm.  In fact, if you go to their website, they seem to be marketing their own archival-quality disc storage under the brand name of M*ARC.  Per Millenniata:

Millenniata is the sole provider of a permanent, backwards-compatible archiving solution for the digital age. Located in Springville, Utah, Millenniata is poised to become the world’s leader in digital data preservation. Millenniata is the result of pioneering inventions from Brigham Young University.

So… what’s going on here exactly?  It’s difficult for me to accept that Cranberry claims to have cornered the market in creating DVDs that last a millenium, only to point to research done by a company that claims it holds the sole solution to the disc rot problem.

Both companies claim that their respective formats are not something that the mundane DVD burners on standard computer equipment can burn.  Cranberry claims the needed equipment to make a DiamonDisc is “out of the reach of most consumers,” and it wants you to send your data to them so that they can “etch” a DVD for you, at a cost between $29.99 to $34.95.  Millenniata, however, will gladly sell you the special burner required to author these discs, which according to some news sources will cost up to $5,000 for the drive and a pack of 10 discs.  Once created though, both products will purportedly read just fine on any DVD drive. The capacity of these discs are 4.7GB, the same as a single layer DVD-R.

My as-yet-unconfirmed speculation is that, based on the descriptions both entities provide for their products, the storage products both companies offer might actually be one and the same.  If that’s true, then I’d have to conclude that the research done on these discs’ longevity is hardly objective and unbiased.

So, should archivists invest money in these discs?

My philosophy has always been that you should never blindly trust a vendor’s claim about how long their storage media should last, nor should you trust a single storage solution for archiving your digital content.  If a curator wants to spend multiple thousands of dollars for an M*ARC drive, or spend $30-$35 for Cranberry to author each disc, that’s their decision, but it should be supplemented by a secondary solution, be that hard drives, tape, solid-state media, or some other well-known container format.   And that data needs to be checked periodically to verify its integrity.

Cranberry claims that the Library of Congress is studying the technology for its own archival use. Personally speaking, I would like to see the definitive results of such a study, before I would feel comfortable making a decision on whether M*ARC or DiamondDiscs are worth the premium.  And even then, I still would not back away from my two-format philosophy to long-term digital storage.

It’s ironic that Cranberry is hyping up NARA and LoC studies, citing them as reasons why you should be very,very afraid about storing your precious data on standard writeable disc media.  Let’s hope for their sake that the LoC verifies their claims are true, or else their marketing literature would prove pretty embarrassing.

The case for improved large file support in digital repositories
Nov 2nd, 2009 by Isaiah Beard

As the person responsible for handling the various file formats in RUcore, the digital library repository for Rutgers University Libraries, I’ve been looking with trepidation at the increasing sizes of the digital assets people are starting to create.  In 2004 when the architecture for this was first envisioned, very few digital items grew past the hundred-megabyte point.

How things have changed!  Video and even audio files are routinely pushing into the gigabytes, now that technology has progressed to the point where high-definiteion video and audio can be originated for ubiquitous mobile devices.  And as RUcore and other large repositories seek to preserve this content, we are finding ourselves running into a hurdle we did not anticipate: the ability for our architectures to handle these very large digital files.  In particular, files larger than 2 Gigabytes has posed some exceptions forFEDORA, our infrastructure of choice, and this is a very big deal for video content in particular.  Consider that 2 Gigabytes can comprise less than 5 minutes of HD content, and you can see our dilemna.

Added mechanisms to support these large items has been slow in coming, and have presented some difficulties of their own in implementing.  For this reason, I’ve drafted a document which explains our position on why we need uniform large file support in digital repositories.  Feel free to have a look and provide feedback.

With any luck, developers will heed the call presented here and in other institutions,a nd work to make better support for big files a reality.

Google’s Neglected Archive
Oct 8th, 2009 by Isaiah Beard

Controversial as it may be to archivists of both the digital and analog realms, Google is often seen as the ubiquitous oracle of record by the general online population.  Websites (including libraries) strive to be listed on its search engines; it runs the de facto video archive for the internet at large; it’s made searching patents and even scholarly material easier to access than their respective online custodians in many cases.  Research institutions and businesses even now turn to it to handle their growing mounds of e-mail and electronic documents.

Yet, curators and librarians have been very skeptical of Google, its aims, and even its competency at truly being able to objectively preserve the massive digital content it aims to take on.  The ongoing Google Books drama is one such aspect of this.  But now, others outside of the world of preservationists are seriously calling Google’s competency and motives into question.

Wired.com has posted an article about what may well be an example of Google’s mismanaging of important archives.  Usenet was probably the Internet’s earliest online reference, social network, and historical archive, consisting of terabytes of text articles, files and writings from internet users (and its predecessors) dating back to 1980.  While most of the content can be quite mundane, some of these articles document milestones in online history or contain the writings of influential pioneers of the Internet.  Google, through its aquisitions of various entities over the years, became the world’s de facto curator of this content, and according to Wired contributor Kevin Paulson,Google is failing in this role:

… visiting Google Groups is like touring ancient ruins.

On the surface, it looks as clean and shiny as every other Google service, which makes its rotting interior all the more jarring — like visiting Disneyland and finding broken windows and graffiti on Main Street USA.

Searching within a newsgroup, even one with thousands of posts, produces no results at all. Confining a search to a range of dates also fails silently, bulldozing the most obvious path to exploring an archive.

Want to find Marc Andreessen’s historic March 14, 1993 announcement in alt.hypertext of the Mosaic web browser? “Your search – mosaic – did not match any documents.”

Wired’s send-up pretty much hints towards the exact concern that many archivists have about Google’s aims: that they aren’t motivated to maintain assets they hold if those assets are no longer making them any significant money.  With newer, more visually-appealing technologies largely supplanting the once-huge popularity of Usenet, its archive is not as hot a commodity as it used to be, and one can speculate that if it’s not generating enough ad revenue, Google isn’t going to care to maintain the archive or keep it functional.

So, what happens when certain subject matter in their scanned books archive becomes less-popular – and thus less visited – within the Internet’s incredibly short attention span?


SIDEBAR
»
S
I
D
E
B
A
R
«
»  Substance:WordPress   »  Rights: Creative Commons License