A hard disk drive with damaged platters, caused by a head crash. The data on this drive is not recoverable.
Whether we like it or not, those of us who rely on electronics to get our work done are guaranteed one thing: a data loss event. This means that at least once in our lifetimes (and sometimes more than that), every one of us who uses a computer, laptop, tablet, smartphone or similar device is going to one day stare at our screens and realize that the piece of information we expected to be there, just isn’t.
It can happen any number of ways. Sometimes, we users make a mistake and accidentally erase something we shouldn’t have… or someone else might’ve accidentally deleted something of ours that they shouldn’t have. Other times, it’s the computer’s fault: buggy software might’ve claimed to save something but didn’t, or a 10-year-old hard drive finally decided to give up the ghost. And sometimes, acts of nature (power outage, natural disaster, or other events beyond our control) will intervene and cause vital work to be lost.
Of course, we’ve all heard it time and time again: to protect your documents, photos, drawings, artwork, and other important data, you need to have backups. Unfortunately, while we all have heard this before and know it to be true, we don’t always follow through. In the past it’s been tedious to do regular backups; a chore we all dread. And so, it always falls but he wayside, and often, we get back into a backup regimen only after something bad has happened, and it’s already too late.
But take heart. A lot has changed recently. There ARE personal backup solutions out there that are surprisingly easy… and even automatic! keeping your stuff safe doesn’t have to be a tedious chore anymore… as long as you’re willing to invest a little time and effort at the beginning, and in some cases a small amount of cash on an ongoing basis.
Read the rest of this entry »
Cranberry Harvest in New Jersey. Source: USDA
A few months back, I wrote about our efforts to leverage RUcore for the benefit of the academic research community at Rutgers. The result is RUresearch, a place for Rutgers researchers to share their data with the global scholarly community. This data sharing is particularly important in light of a National Science Foundation mandate to openly share research data that has been funded through them.
Over the summer, the RUcore team has been working with a few researchers to better understand their needs, and to work on preserving and sharing our first samples of actual research data. In collaboration with the Philip E. Marucci Center for Blueberry and Cranberry Research and Extension, our efforts – if you’ll pardon the pun – have begun to bear fruit.
As part of funded by the U.S. Department of Agriculture Specialty Crop Research Initiative, Marucci Center researchers have extracted a genome for a cultivar of the cranberry; a fruit for which New Jersey is the third-largest producer in the US, devoting some 3,600 acres to its cultivation.
The genome research is part of a study in genetics of fruit rot-rresistance, and the data generated (using Applied Bioscience’s SOLiD 3 Plus System) takes up over 60GB of storage when compressed. Sharing of this data to researchers who would find it useful obviously requires a system that can not only spare the storage, but be robust enough to permit open access. Enter RUcore.
Although further refinements are in progress, the result of our collaboration is one of our first research data records in RUcore, located at this link. The PDF attached to that record describes the link to the download point for the data sets.
While the data itself isn’t something the general public will easily recognize and interpret, the ability to share this information with other researchers can benefit all of us, through continued study into which genetic factors can make certain fruits resistant to rotting. And it’s also a learning experience for us, in how to make that sharing among researchers a little bit easier.
Although Rutgers University Libraries has had digitization standards in place since 2006 for the RUcore object types we currently handle, the documents were often hidden deep in places where they weren’t easily found. This made it hard for members of the public, and other people interested in finding a resource for how best to digitize to find out what we’re doing.
Additionally, the documentation was getting a bit long in the tooth; some of the proposals hadn’t been looked over in years, some still had “Draft” markings even though committees have reviewed them and we’ve already been carrying these procedures out, and in a couple of cases the documentation has been superseded by technology advances, and doesn’t match current practice at all.
For this reason, we’ve been engaged in a review of these standards and are revising where needed to make them reflect current best practices within RUcore and the Digital Curation Research Center. Additionally, I’ve created a “home” for the complete set of documents here:
The link to these standards are also available on the upper-left corner of this blog, in the navigation bar.
We hope that keeping these standards in one place will greatly benefit other curators and those who need a place to get started digitizing and preserving works.
Salman Rushdie. Source: Wikipedia. Click on image for link to source.
The New York Times today published an article that reflects some of the challenges of preserving born digital content – that is, documents, data and other content that has been created digitally, on a computer or electronic device, and for which there is no physical original (such as on paper).
In particular, they highlight the efforts of Emory University, in preserving Salman Rushdie’s archival materials.
Among the archival material from Salman Rushdie currently on display at Emory University in Atlanta are inked book covers, handwritten journals and four Apple computers (one ruined by a spilled Coke). The 18 gigabytes of data they contain seemed to promise future biographers and literary scholars a digital wonderland: comprehensive, organized and searchable files, quickly accessible with a few clicks. But like most Rushdian paradises, this digital idyll has its own set of problems. As research libraries and archives are discovering, “born-digital” materials — those initially created in electronic form — are much more complicated and costly to preserve than anticipated. Electronically produced drafts, correspondence and editorial comments, sweated over by contemporary poets, novelists and nonfiction authors, are ultimately just a series of digits — 0’s and 1’s — written on floppy disks, CDs and hard drives, all of which degrade much faster than old-fashioned acid-free paper. Even if those storage media do survive, the relentless march of technology can mean that the older equipment and software that can make sense of all those 0’s and 1’s simply don’t exist anymore. Imagine having a record but no record player.
Among the archival material from Salman Rushdie currently on display at Emory University in Atlanta are inked book covers, handwritten journals and four Apple computers (one ruined by a spilled Coke). The 18 gigabytes of data they contain seemed to promise future biographers and literary scholars a digital wonderland: comprehensive, organized and searchable files, quickly accessible with a few clicks.
But like most Rushdian paradises, this digital idyll has its own set of problems. As research libraries and archives are discovering, “born-digital” materials — those initially created in electronic form — are much more complicated and costly to preserve than anticipated.
Electronically produced drafts, correspondence and editorial comments, sweated over by contemporary poets, novelists and nonfiction authors, are ultimately just a series of digits — 0’s and 1’s — written on floppy disks, CDs and hard drives, all of which degrade much faster than old-fashioned acid-free paper. Even if those storage media do survive, the relentless march of technology can mean that the older equipment and software that can make sense of all those 0’s and 1’s simply don’t exist anymore.
Imagine having a record but no record player.
An interesting aspect of this collection and its exhibition is that it emulates the experience Rushdie had in creating the content. Rather than just viewing the finished documents, you get to see the computer desktop as he saw it, open up the same applications he used, all in the 1980s and 1990s technological contexts… and not using the modern, Web 2.0, Windows 7 or Mac OS X trappings we’re accustomed to in today’s computers.
I think this article is an excellent read, irrespective of what one’s views may be on the subject matter. Material of all kinds, in increasing amounts, faces the same perils as this collection every day, and archivists everywhere, including this one, wrestle with how best to retain it all. So far, the only tried and true method for such types of preservation is to obsessively manage and migrate the content, and that requires making tough decisions as to how to proceed, what formats to migrate to, and hoping the decisions made are the right ones to keep the content viable, at least until the next generation of technology requires that the hard decisions be made again.
The facility I work in at Rutgers, known as the Scholarly Communication Center (SCC), has a fairly short history in the grand scheme of academia, and yet a fairly long one when it comes to the rapid changes in technology it has seen in its lifetime. It was originally started in the 1996, and meant to be a location for university students and faculty to access a growing body of the then-nascent collection of digital content.
Back then, the internet still wasn’t very fast and wasn’t nearly as media-rich as it it seems today. And so, most of the data-heavy reference materials arriving in digital form came to the SCC as CD-ROMs (and later, DVD format). To accommodate this, the SCC had a lab of ten desktop computers (known as the Datacenter), dedicated solely to accessing this type of material.
But the times changed, and so did the way people accessed digital material. As the ‘net grew in size and capacity, it no longer made sense to ship reference material on disc, and so the access moved online. Students migrated from visiting computer labs to bringing their own laptops (and later, netbooks and handheld mobile devices). Traffic at the datacenter dropped to virtually nothing. The space had to be re-tooled to continue to be relevant and useful.
And so, with my taking on the newly-minted role of Digital Data Curator, and in collaboration with my colleagues, a new plan for the former datacenter was developed. Instead of being a place to merely access content, we would be a place to create it. Analog items that needed to be digitized would be assessed and handled here. New born-digital content would be edited, packaged, and prepared for permanent digital archiving in our repository. We would be a laboratory where students getting into the field – and even faculty and staff who have been here a good while – would learn, hands-on, how to triage and care for items of historical significance, both digital and analog, and prepare them for online access.
The concept for a new facility was born. And we call it the Digital Curation Research Center.
The center is still in “beta,” as we plug along with some internal projects for testing purposes along with a couple of willing test subjects within the university and surrounding community. This is so we can test out the workflow of the space and make tweaks and optimizations as needed. Our plan is to officially launch the space in the Spring of 2010, with a series of workshops and how-to sessions for the various things that make digital curation vital (e.g. digital photography, video editing, audio and podcasting, and scanning).
The plan is that this will be a continual, evolving learning experience for all involved. People who have never really used cameras and recording equipment in a historical context will learn just how increasingly valuable the content they create, and the stories it will tell, can become over time. And those of us in the DCRC day in and day out will encounter things that we’ve never run into before, and will have to wrap our heads around the issue of preserving it effectively.
Below are related documents that provide additional information about the DCRC. More information will be coming up as we get closer to the official launch:
Enter your email address to subscribe to this blog and receive notifications of new posts by email.
Email Address
Subscribe