The scanner: workhorse of photo and document preservation
Sep 14th, 2009 by Isaiah Beard

Pretty often, I get questions by phone and e-mail from people who are just getting started with their digital preservation projects, and need to scan photos or documents.  Almost always, they seek advice about what kind of scanner to buy.  A lot of times, the questions are similar:

I this article, I hope to lay down some basic recommendations to get beginners looking for a scanner that suits their needs.

The first step: consider the size of the task

For most people at home, and even some institutions with objects to scan, the vast majority of documents ripe for scanning will physically fit in a letter-sized, flatbed scanner (such as the one pictured at the top of this article).  For these types of collections, the vast majority of scanners out there will be just fine for your needs.  However, there are people who have larger objects: photos and maps as large as 11 inches by 17 inches, or perhaps even bigger.  And then there are the smaller items and specialty objects: photographic negatives, slides, and contact sheets.

And so, the first step should begin before you even buy the equipment: take stock of your archives and find out if they have any specific needs.  Consider how much of your collection are large items, and how much of it consists of very small objects, transparencies, film and negatives.

If your large items comprise only a small amount of your total items (say, 10-15% or less) and you don’t see yourself acquiring more in the near future, then you might get away with a standard-sized scanner, and outsourcing the digitization of these large objects to a third party.  Some local print shops, and even national chains like Alphagraphics and Fedex Office provide these services for a fee, or can refer you to a vendor.  Additionally, our own facility at Rutgers provides a similar service to the public for a nominal fee, based on availability.

If removing these objects to an outside location is out of the question, if the amount of oversized objects you’re scanning is large, or you plan on digitizing oversized objects regularly, then it may be wiser to invest in a specialty scanner that handles these objects.  You’ll also want to look for transparency and film support in any scanner, if your collection contains these types of objects.  I’ll list a couple of examples in a bit.

The second step: consider the capabilities of what you want to buy

For most applications, you really won’t need to spend a whole lot of money to get a good scanner.  However, getting something excruciatingly cheap can come with a price.  A happy medium needs to be found between these two competing factors.

I’ve developed a set imaging standards that set minimum resolutions for scanning photographs and documents.  600 dpi is a minimum we commonly aim for, and you’ll find that even the cheapest of modern flatbed scanners can meet this requirement.  Even so, I’m recommending to most buyers that they look into scanning equipment that optically scan at least at resolutions of at least 4800dpi.

Why is this?  Mainly because while 600dpi is just perfect for things that are 4 x 6 inches and larger, you will need to scan at much higher resolutions for things like wallet sized photos, small postcards, and especially those slides and film negatives.  Such small items pack a large amount of detail in a tiny space, and a lot of that gets lost when you just assume everything will be okay when scanned at 600 dpi.

The main caveat I’ve placed in my standards documentation is the 3,000 pixel rule: the idea that in order to get a decent amount of detail out of any object, at least one side of the image must be at least 3,000 pixels long or wide.  For small 35mm slides or even 3 x 5 inch photo prints, 600 dpi just isn’t high enough resolution to meet this goal.  And so, higher resolution settings have to be used to capture the necessary detail.  It’s not uncommon for us at the SCC to scan slides as high as 3200 dpi to get an effective, detailed scan.

The good news is that most scanners are very reasonably priced and yet deliver excellent features and high scanning resolutions.  It’s possible to get a good, current, letter sized scanner with adequate film and transparency capability for under $100 each.  Unfortunately, larger flatbed scanners (mostly the 11 x 17 inch variety) are more of a niche market, and can be expensive.  Expect to pay between $1,000 and $5,000 for such devices, and solutions for bigger items might be even higher.

One other thing to keep in mind: if your collection is largely consisting of three dimensional artifacts, or even really large two-dimensional maps (12 by 14 inches or wider) then a flatbed scanner is not what you’re going to need for visual imaging of these kinds of objects.  You will definitely need to invest in a different solution, such as a good digital camera and possibly an appropriate imaging platform, or look into outsourcing this work to a capable third party.  I’ll do up a writeup of some of these options in an upcoming blog post.

Making  your decision

Once you have the needed requirements and criteria down, it’s time to do some shopping.  There are multiple online vendors out there, and they all have a wide array of different scanning equipment you can buy.  Fortunately, most have the capability to narrow down the available selections based on what your looking for.  If you have a preference for a specific brand, or want to just limit to larger flatbeds and higher resolutions, most vnedors will let you comaprison shop based on your choices.

Once you have a few candidate models in mind, it’s a good idea do a little Googling and asking colleagues for their opinions before you commit to buying.  Make sure the scanner you’re looking to buy has a good track record, and that users aren’t having frequent reliability or compatibility problems.  Some sites, such as imaging-resource.com and test freaks, provide detailed write-ups of each model they test, and even provide test scans and comparisons.

What we use

This past summer, the Scholarly Communication Center (the facility where I work) began refurbishing a public computer lab into what we will soon open as a Digital Curation Research Center.  This lab is being outfitted with hardware to tackle a number of different digital curation tasks, and among those pieces of hardware is a set of flatbed scanners for digitizing documents and photos.  After consideration of our  needs, purchasing a few models from different vendors, and sending quite a few of them back for being sub-ar to our requirements, we settled on a couple of different models.

(Please note: this isn’t an endorsement of any specific scanning vendor.  Your needs may be different, and could require you to purchase something different from the choices made here).

Standard, letter-size scanner: The majority of our flatbed scanning work will be done on EPSON Perfection V300 flatbed scanners.  They are capable of imaging at 4800 x 9600 dpi, have built-in film and slide scanners,support Mac and Windows systems, and have this unique horizontal hinge that allos the top cover to lift to the side in a way that can support brittle books really well.  All for under $90 apiece.

Plus-size/bulk transparency flatbed scanners (2): The lab has two tabloid-size scanning workstations: one Microtek Scanmaker 1000XL Pro we had purchased from a Previous project, and anEPSON 10000XL Photo scanner.  The Microtek scanner is a workhorse and provides excellent imagery, but unfortunately is no longer for sale in the United States now that the company has exited the retail market.  The Epson model, however, is a worthy successor, and provides excellent tabloid support in addition to some batch-slide scanning capabilities.  In some of our slide-scanning projects, we’ve been able to arrange up to 30 slides at a time and have this scanner produce individual, 3200 dpi scans of each frame, completing each set in about an hour.

What about All-in-One printers/scanners/faxes?

The All-In-One solution is a tempting proposition for some very small outfits and home users.  In fact, a lot of printer and scanner manufacturers like HP, Canon, Kodak and EPSON fill their product lineups with these combo devices.  They’re beneficial for users who have occasional light scanning and printing needs, and provide repeat business for the vendors, who can sell these devices at a loss knowing that users will have to come back later for ink and supplies.

These solutions will make sense for home users who have boxes of photos and documents in their attics and closets, and want to preserve these items in a digital form while clearing out some space.  I fact, use an HP Photosmart C4000 series printer/scanner combo in my home, and am quite happy with it.  However, I wouldn’t recommend such items for regular business or institutional use.  Bear in mind that just like any other combo device, you may find yourself having to toss a perfectly good scanner if the printer portion happens to malfunction, or vice versa.  In my experience, printers and scanners get a lot more mileage in a business or institutional setting, and these units are just not built to withstand the sheer volumes that might be required of them in that environment.

The pitfalls of large hard drives – and national security
May 20th, 2009 by Isaiah Beard

Well, here’s an example of how putting all your data eggs in one basket can be quite dangerous.  The National Archives and Records Administration has reported the loss of an external hard drive containing a massive amount of data, the information being personal data at best, and items potentially related to national security matters at worst:

The Inspector General of the National Archives and Records Administration (NARA) told congressional committee staffers Tuesday that a hard drive containing over a terabyte of information – the equivalent of millions of books-went missing from the NARA facility in College Park, Md., sometime between October 2008 and March 2009.

The Department of Justice and the Secret Service are conducting an investigation, but it’s so far unclear whether the drive was lost as the result of a crime or an accident.

Of course, the technologist in me finds it really interesting that over 8 years ago, the federal government apparently had access to 1 terabyte hard drives!  Those have only become mainstream technology over the past three years or so.  But I digress…

NARA clearly takes the issue seriously, and has posted a FAQ (pdf) about the disappearance.  The document highlights something else of note – how long the drive was “missing” as opposed to “last seen.”

Preserving digital photos: What not to do
Apr 6th, 2009 by Isaiah Beard

camera disassembled

One of the more frequent debates that I see cropping up often in preservation circles is how best to preserve “born digital” photographs: those photos that never began as physical film, but originated on a digital camera.

This isn’t an easy topic. There is no industry standard for born digital image preservation. Digital cameras of different vintages and configurations will output in one of a handful of differing file formats, and their metadata will often differ as well. And so, preservationists have been largely left to their own devices, fabricating their own methods, preferred formats and storage procedures for handling this type of material.

One controversial method that has been suggested is to forget about digital altogether, and to use a pigment-based inkjet or die-sub printer to print physical copies of digital photographs and rely on the hard copies as the long-term archive. This is a tempting method for lots of curators who have been trained to trust the physical, and without delving too deep into the specifics this seems at first blush like sound reasoning.

Unfortunately, it can be a very bad idea, and here’s why.

Loss of image fidelity

This is by far the most important reason, and yet not really the most obvious to some. For laypeople, and for the less-experienced in digital formats, creating a print from a digital files is a lot like doing the same from analog film. However, inkjet and photo printers are not going to give you the same level of quality as a true analog photographic print. And the print, while fine to the naked eye, will suffer a significant degradation compared to the original.

The best way to prove this is to take a digital image, make a print, and then rescan it. Here, for instance, is a born digital image taken from a Canon EOS 30D, shot and preserved in Camera RAW format, and presented here as a 24-bit PNG file:

Primary Image in PNG
(Note: clicking on the above image will take you to the full-resolution photograph, a 16MB file.)

I printed this image on a Kodak Photo Printer, using pigment inks, on 4×6 Kodak photo paper. Then, I rescanned the image at 1200dpi, using the scanner attached to the same photo printer. Here’s the resulting re-scan:

Rescan
(Note: clicking on the above image will take you to the full-resolution re-scanned photograph, also a 16MB file.

At these reduced resolutions, there doesn’t seem to be much difference. The color appears slightly off, but it isn’t so bad… right? Well, let’s look a little closer at the re-scan:
Rescan closeup

Yikes! Clearly, there’s a significant compromise in image quality here, and this is because photo printers, regardless of how good they are, rely on printing methods that are unlike the traditional photograph, and through which the same level of quality doesn’t translate if you’re doing a bit-per-bit scan. This becomes even more evident when you compare the re-scan with the digital master, at the same scale.

If this argument isn’t compelling enough, there are other reasons for not relying on a hard copy as your preservation master.

Loss of technical metadata

Most modern digital cameras embed technical metadata into their image files, either by using EXIF, or as built in fields into their own Camera Raw format. This information can contain information about the camera which too the photo, what settings were used, what lenses, time and date, and even the GPS location of the camera, i properly equipped. It goes without saying that all of this potentially valuable metadata is lost if a hard copy is used as a preservation master, in lieu of the digital.

Limited ability to adjust or enhance the image.

Having and preserving the original file created by a digital camera affords a curator, editor or researcher a great deal of leeway and making adjustments to derivative presentation copies. Things like localized color adjustments are very easy to do with the digital master present, particularly if the master is a Camera Raw. On the other hand, your options are very limited if all you have is a print.

The best practice: preserve the digital

The best option for preserving born-digital photos remains keeping them digital. This does have implications for curators wanting to do right by their collections, and it can make the uninitiated very anxious. Capital purchases for technology, backups, and whole new workflows and best practices must be established. Fortunately, the world of digital curation is starting to come into its own, and others have already begun to tread these waters. In future articles, I will outline some best practices and case studies I’ve undertaken and encountered, to help guide those seeking answers to the digital dilemma.


SIDEBAR
»
S
I
D
E
B
A
R
«
»  Substance:WordPress   »  Rights: Creative Commons License