Digital preservation goes rogue: ArchiveTeam scrapes the web to preserve heritage
Jul 2nd, 2012 by Isaiah Beard

Mobile Me is Closed

MobileMe was shut down on July 1 by Apple, as part of its efforts to transition users to other services. Not making the transition, however, were users’ public web sites, shareable Photo Galleries and iDisk, a cloud storage service similar to Dropbox.

Note: It’s important that readers of this article understand that the purpose of this post is to document a growing, grass-roots movement to archive the web, in spite of some rather controversial methods practiced by this movement. While I sympathize with the philosophy, I am not affiliated with, nor do I condone all of their actions, nor is this something we at Rutgers would do without first clearing permissions and rights to archive any content.

One of the big problems with the web is its inherent lack of permanence. There is no formal archiving structure, and like anything digital, it’s very easy for something deemed important by someone to just disappear overnight, with little or no notice.   Sometimes these deletions happen on a mass scale, affecting millions of websites of varying quality, and sometimes arguably of significant cultural value.

Now it appears that, for better or for worse, a group of individuals are working to do something about it… with or without our permission.

Read the rest of this entry »

Google’s Neglected Archive
Oct 8th, 2009 by Isaiah Beard

Controversial as it may be to archivists of both the digital and analog realms, Google is often seen as the ubiquitous oracle of record by the general online population.  Websites (including libraries) strive to be listed on its search engines; it runs the de facto video archive for the internet at large; it’s made searching patents and even scholarly material easier to access than their respective online custodians in many cases.  Research institutions and businesses even now turn to it to handle their growing mounds of e-mail and electronic documents.

Yet, curators and librarians have been very skeptical of Google, its aims, and even its competency at truly being able to objectively preserve the massive digital content it aims to take on.  The ongoing Google Books drama is one such aspect of this.  But now, others outside of the world of preservationists are seriously calling Google’s competency and motives into question.

Wired.com has posted an article about what may well be an example of Google’s mismanaging of important archives.  Usenet was probably the Internet’s earliest online reference, social network, and historical archive, consisting of terabytes of text articles, files and writings from internet users (and its predecessors) dating back to 1980.  While most of the content can be quite mundane, some of these articles document milestones in online history or contain the writings of influential pioneers of the Internet.  Google, through its aquisitions of various entities over the years, became the world’s de facto curator of this content, and according to Wired contributor Kevin Paulson,Google is failing in this role:

… visiting Google Groups is like touring ancient ruins.

On the surface, it looks as clean and shiny as every other Google service, which makes its rotting interior all the more jarring — like visiting Disneyland and finding broken windows and graffiti on Main Street USA.

Searching within a newsgroup, even one with thousands of posts, produces no results at all. Confining a search to a range of dates also fails silently, bulldozing the most obvious path to exploring an archive.

Want to find Marc Andreessen’s historic March 14, 1993 announcement in alt.hypertext of the Mosaic web browser? “Your search – mosaic – did not match any documents.”

Wired’s send-up pretty much hints towards the exact concern that many archivists have about Google’s aims: that they aren’t motivated to maintain assets they hold if those assets are no longer making them any significant money.  With newer, more visually-appealing technologies largely supplanting the once-huge popularity of Usenet, its archive is not as hot a commodity as it used to be, and one can speculate that if it’s not generating enough ad revenue, Google isn’t going to care to maintain the archive or keep it functional.

So, what happens when certain subject matter in their scanned books archive becomes less-popular – and thus less visited – within the Internet’s incredibly short attention span?


SIDEBAR
»
S
I
D
E
B
A
R
«
»  Substance:WordPress   »  Rights: Creative Commons License