Designing and Implementing a Center for Digital Curation Research
Nov 17th, 2009 by Isaiah Beard

The facility I work in at Rutgers, known as the Scholarly Communication Center (SCC), has a fairly short history in the grand scheme of academia, and yet a fairly long one when it comes to the rapid changes in technology it has seen in its lifetime.  It was originally started in the 1996, and meant to be a location for university students and faculty to access a growing body of the then-nascent collection of digital content.

Back then, the internet still wasn’t very fast and wasn’t nearly as media-rich as it it seems today.  And so, most of the data-heavy reference materials arriving in digital form came to the SCC as CD-ROMs (and later, DVD format).  To accommodate this, the SCC had a lab of ten desktop computers (known as the Datacenter), dedicated solely to accessing this type of material.

But the times changed, and so did the way people accessed digital material.  As the ‘net grew in size and capacity, it no longer made sense to ship reference material on disc, and so the access moved online.  Students migrated from visiting computer labs to bringing their own laptops (and later, netbooks and handheld mobile devices).  Traffic at the datacenter dropped to virtually nothing.  The space had to be re-tooled to continue to be relevant and useful.

And so, with my taking on the newly-minted role of Digital Data Curator, and in collaboration with my colleagues, a new plan for the former datacenter was developed.  Instead of being a place to merely access content, we would be a place to create it.  Analog items that needed to be digitized would be assessed and handled here.  New born-digital content would be edited, packaged, and prepared for permanent digital archiving in our repository.  We would be a laboratory where students getting into the field – and even faculty and staff who have been here a good while – would learn, hands-on, how to triage and care for items of historical significance, both digital and analog, and prepare them for online access.

The concept for a new facility was born.  And we call it the Digital Curation Research Center.

The center is still in “beta,” as we plug along with some internal projects for testing purposes along with a couple of willing test subjects within the university and surrounding community.  This is so we can test out the workflow of the space and make tweaks and optimizations as needed.  Our plan is to officially launch the space in the Spring of 2010, with a series of workshops and how-to sessions for the various things that make digital curation vital (e.g. digital photography, video editing, audio and podcasting, and scanning).

The plan is that this will be a continual, evolving learning experience for all involved.  People who have never really used cameras and recording equipment in a historical context will learn just how increasingly valuable the content they create, and the stories it will tell, can become over time.  And those of us in the DCRC day in and day out will encounter things that we’ve never run into before, and will have to wrap our heads around the issue of preserving it effectively.

Below are related documents that provide additional information about the DCRC.  More information will be coming up as we get closer to the official launch:

Google’s Neglected Archive
Oct 8th, 2009 by Isaiah Beard

Controversial as it may be to archivists of both the digital and analog realms, Google is often seen as the ubiquitous oracle of record by the general online population.  Websites (including libraries) strive to be listed on its search engines; it runs the de facto video archive for the internet at large; it’s made searching patents and even scholarly material easier to access than their respective online custodians in many cases.  Research institutions and businesses even now turn to it to handle their growing mounds of e-mail and electronic documents.

Yet, curators and librarians have been very skeptical of Google, its aims, and even its competency at truly being able to objectively preserve the massive digital content it aims to take on.  The ongoing Google Books drama is one such aspect of this.  But now, others outside of the world of preservationists are seriously calling Google’s competency and motives into question.

Wired.com has posted an article about what may well be an example of Google’s mismanaging of important archives.  Usenet was probably the Internet’s earliest online reference, social network, and historical archive, consisting of terabytes of text articles, files and writings from internet users (and its predecessors) dating back to 1980.  While most of the content can be quite mundane, some of these articles document milestones in online history or contain the writings of influential pioneers of the Internet.  Google, through its aquisitions of various entities over the years, became the world’s de facto curator of this content, and according to Wired contributor Kevin Paulson,Google is failing in this role:

… visiting Google Groups is like touring ancient ruins.

On the surface, it looks as clean and shiny as every other Google service, which makes its rotting interior all the more jarring — like visiting Disneyland and finding broken windows and graffiti on Main Street USA.

Searching within a newsgroup, even one with thousands of posts, produces no results at all. Confining a search to a range of dates also fails silently, bulldozing the most obvious path to exploring an archive.

Want to find Marc Andreessen’s historic March 14, 1993 announcement in alt.hypertext of the Mosaic web browser? “Your search – mosaic – did not match any documents.”

Wired’s send-up pretty much hints towards the exact concern that many archivists have about Google’s aims: that they aren’t motivated to maintain assets they hold if those assets are no longer making them any significant money.  With newer, more visually-appealing technologies largely supplanting the once-huge popularity of Usenet, its archive is not as hot a commodity as it used to be, and one can speculate that if it’s not generating enough ad revenue, Google isn’t going to care to maintain the archive or keep it functional.

So, what happens when certain subject matter in their scanned books archive becomes less-popular – and thus less visited – within the Internet’s incredibly short attention span?

Original NASA moon landing tapes: probably gone for good
Jul 16th, 2009 by Isaiah Beard

40 years ago, the Apollo 11 mission blasted off into space, making history as the first successful human landing on the moon.  Unfortunately, NPR reports that after much searching, it looks like the best possible copies of the video that recorded this momentous event have long been erased:

Over the years, NASA had removed massive numbers of magnetic tapes from the shelves. In the early 1980s alone, tens of thousands of boxes were withdrawn.

It turns out that new satellites had gone up and were producing a lot of data that needed to be recorded. “These satellites were suddenly using tapes seven days a week, 24 hours a day,” says Lebar.

And the agency was experiencing a critical shortage of magnetic tapes. So NASA started erasing old ones and reusing them.

That’s probably what happened to the original footage from the moon that the astronauts captured with their lunar camera, says Lebar. It was stored on telemetry tapes, and old tapes with telemetry data were being recycled.

The article also explains how the specially-designed video cameras that astronauts took the moon produced videos of much higher quality than the snowy, blurry video American households saw that night, and we’ve seen for many years since.  Regrettably, the Apollo video cameras used a non-standard format, requiring machinations on the ground to both store the content and convert it to more conventional means (and thus, introducing the noise and blur on currently available tapes).

And so, NASA becomes a poster child not only for the pitfalls of poor preservation planning, but the perils of using non-standard, proprietary formats to record important, historic moments!

The pitfalls of large hard drives – and national security
May 20th, 2009 by Isaiah Beard

Well, here’s an example of how putting all your data eggs in one basket can be quite dangerous.  The National Archives and Records Administration has reported the loss of an external hard drive containing a massive amount of data, the information being personal data at best, and items potentially related to national security matters at worst:

The Inspector General of the National Archives and Records Administration (NARA) told congressional committee staffers Tuesday that a hard drive containing over a terabyte of information – the equivalent of millions of books-went missing from the NARA facility in College Park, Md., sometime between October 2008 and March 2009.

The Department of Justice and the Secret Service are conducting an investigation, but it’s so far unclear whether the drive was lost as the result of a crime or an accident.

Of course, the technologist in me finds it really interesting that over 8 years ago, the federal government apparently had access to 1 terabyte hard drives!  Those have only become mainstream technology over the past three years or so.  But I digress…

NARA clearly takes the issue seriously, and has posted a FAQ (pdf) about the disappearance.  The document highlights something else of note – how long the drive was “missing” as opposed to “last seen.”

Removable-Disk Storage: Not dead yet?
Apr 29th, 2009 by Isaiah Beard

An interesting tidbit from General Electric may possibly breathe new life into removable-disc storage, with their just-announced holographic technology:

NISKAYUNA, N.Y., Apr 27, 2009 (BUSINESS WIRE) — GE | Quote | Chart | News | PowerRating — GE Global Research, the technology development arm of the General Electric Company (NYSE: GE), today announced a major breakthrough in the development of next generation optical storage technology. GE researchers have successfully demonstrated a threshold micro-holographic storage material that can support 500 gigabytes of storage capacity in a standard DVD-size disc. This is equal to the capacity of 20 single-layer Blu-ray discs, 100 DVDs or the hard drive for a large desktop computer.

Impressive storage for a CD form factor. However, as other technology sites are pointing out, cramming more data into a CD-sized disc doesn’t mean that the general public is going to automatically adopt it. Currently, several factors are conspiring against this type of removable media:

– Blu-Ray.
This format was supposed to be the heralded new technology for consumers and video enthusiasts to obtain and store their high definition video along with “massive” amounts (25-50GB) of data. However, Blu-ray has been out for a couple of years now, and adoption has been tepid at best. most hopes were pinned on mass adoption in 2009, but this assumes that the economy qill recover quickly enough for large swaths of consumers to start spending on consumer technology again.

Somehow I have my doubts that this announcement of an “even better” technology will make the situation any better. We haven’t even realized the potential of Blu-ray, and those who have shelled out money on Blu-Ray players and discs now face the possibility that obsolescence of the format may already be close at hand, far sooner than faced by DVD.


SIDEBAR
»
S
I
D
E
B
A
R
«
»  Substance:WordPress   »  Rights: Creative Commons License