Tech blogs are great sometimes for showing people nice “hacks” (or really, just features) that people can use to get the most of the technology they use. But, they don’t always get their facts correct.
One glaring example is an article I came across last week, explaining how to remove Exif Metadata from photos on your iPhone. This metadata, the author insists, is the enemy, allegedly gunking up your phone with needless data:
Unless you’re a super professional photographer or are wanting to get a bit more in-depth with your photographs/info for a project of some sort, then EXIF data has little use to you besides weighing down your iPhone with unnecessary metadata.
What they’re suggesting is that 1. Metadata takes up a lot of space, and 2. It “bogs down” your phone. Both of these things are false.
What is EXIF metadata?
EXIF is a standard for technical metadata that is embedded intro most photos. Some of it if fairly routine: information about the editing software used, the make and model of camera, and editing history. But there is also some relatively useful information just as date and time stamping and location data often contained in EXIF, especially for photographs taken by smartphones. The data is necessary for your photo software to organize your photos by time and place… very useful if you have thousands of photos and want to quickly search for a specific one.
The authors of the above article recommend that you “strip” you photos of all the metadata to solve some problem that doesn’t really exist. This isn’t such a great idea, because metadata is how your phone (and your tablet, and your computer) organize your memories. When you say, “Hey Siri/Google/Alexa/Cortana, show me those photos I took last year from Timbuktu,” your assistant needs to have location and time data matched up with the photo. That’s metadata. And metadata is how it knows the difference between the photos you took last year at Timbuktu, from the photos you took last night at Olive Garden.
If you remove all that, it would be like someone taking a few thousand physical photos out of their boxes/envelopes/albums, erasing anything written on the back of them, and then scattering them all on the floor, many of them face-down. Then saying “quick! Find that photo of Aunt Agnes on the third night of her first honeymoon!”
You’d go nuts trying to find that picture, and you’ll probably give up before you actually stumble on it. Likewise, if you wipe all your photo metadata on your phone, your smartphone will suddenly act a lot dumber, and will suddenly be unable to find a lot of things it used to just magically know.
Does metadata take up a lot of space? No. At work, we have a small repository with 11.46 Terabytes of data… enough to fill about 359 iPhone 7+s to the brim. Of all that data, 0.01% of that is metadata. And we are VERY detailed about our metadata, more so than what the average smartphone records.
Metadata doesn’t “bog down” your phone. It actually makes your phone do a lot of the things you expect it to do.
That said, there ARE some cases where you might want to make a copy of a picture or a video, and then wipe its metadata. Sharing a picture with someone else but wanting to remain anonymous, for example. Or, posting a video but preserving your privacy on social media. For reasons like that, yes, this is useful tool. Just not something you want to do on your whole collection of media, “just because.”
Never do this: smartphones can be deadly to magnetically stored data in some circumstances.
In my dealings with preserving older, born-digital documents and data, I’ve run into this situation quite often: Someone comes into the DCRC with a 3.5″ floppy disk or other magnetic media and asks if we can help them migrate the data to more modern storage, such as a USB flash drive. We do maintain a couple of floppy drives for this purpose, so normally we can help. However, we sometimes cringe and express a bit of concern at how they’re holding the floppy disk(s) being brought in, or rather, what people commonly hold those old disks against.
What’s the problem? Smartphones, and sometimes tablets or even modern laptops. With mobile devices being nearly ubiquitous in the US and particularly among college students and faculty, it’s a normal occurrence to see them being carried around in one’s hand. It’s also not uncommon to stack a smartphone against some other object a person might be carrying… like a book, or a laptop, or, unfortunately, that floppy disk you might want to recover data from.
Read the rest of this entry »
German researcher D. Kriesel discovered that certain characters are being modified by Xerox copiers, when documents are scanned to PDF. In this example, the meanings of numeric figures were altered when the Xerox system changed out the number “6” and with the number “8” in multiple locations. The cause appears to be faulty compression settings, causing similar-looking characters to be overlaid and repeated in an effort to reduce the size of the scanned files.
Over the past week, there has been a great deal of buzz in the IT community about a discovery by a researcher in Germany that certain Xerox Workcentre copy/scan stations are altering the content of documents scanned to PDF. In particular, attention has been focused on the Xerox WorkCentre 7535 and 7556 models. Kriesel found that “patches of the pixel data are randomly replaced in a very subtle and dangerous way. In particular, some numbers appearing in a document may be replaced by other numbers when it is scanned.”
According to Xerox, a software update is coming to address the issue. From their official statement:
We continue to test various scanning scenarios on our office devices, to ensure we fully understand the breadth of this issue. We’re encouraged by the progress our patch development team is making and will keep you updated on our progress here at the Real Business at Xerox blog. We’ve been working closely with David Kriesel, the researcher who originally uncovered the scenario, and thank him for his input which we are continuing to investigate. As we’ve discussed with David, the issue is amplified by “stress documents,” which have small fonts, low resolution, low quality and are hard to read. While these are not typical for most scan jobs ultimately, our actions will always be driven by what’s right for our customers.
We continue to test various scanning scenarios on our office devices, to ensure we fully understand the breadth of this issue. We’re encouraged by the progress our patch development team is making and will keep you updated on our progress here at the Real Business at Xerox blog.
We’ve been working closely with David Kriesel, the researcher who originally uncovered the scenario, and thank him for his input which we are continuing to investigate. As we’ve discussed with David, the issue is amplified by “stress documents,” which have small fonts, low resolution, low quality and are hard to read. While these are not typical for most scan jobs ultimately, our actions will always be driven by what’s right for our customers.
There are still points of contention, however. Read the rest of this entry »
One of the topics I like to bring up in the discussion of preserving digital data is the idea of a Digital Dark Age… the notion of a period in our historic knowledge that ends up getting lost due to a failure to plan and preserve our early digital content.
The New Scientist, however, recently published an article (Feb 2, 2010) on something a bit more cataclismic: the concept of Digital Doomsday. From the article:
Suppose, for instance, that the global financial system collapses, or a new virus kills most of the world’s population, or a solar storm destroys the power grid in North America. Or suppose there is a slow decline as soaring energy costs and worsening environmental disasters take their toll. The increasing complexity and interdependency of society is making civilisation ever morevulnerable to such events (New Scientist, 5 April 2008, p 28 and p 32). Whatever the cause, if the power was cut off to the banks of computers that now store much of humanity’s knowledge, and people stopped looking after them and the buildings housing them, and factories ceased to churn out new chips and drives, how long would all our knowledge survive? How much would the survivors of such a disaster be able to retrieve decades or centuries hence?
Suppose, for instance, that the global financial system collapses, or a new virus kills most of the world’s population, or a solar storm destroys the power grid in North America. Or suppose there is a slow decline as soaring energy costs and worsening environmental disasters take their toll. The increasing complexity and interdependency of society is making civilisation ever morevulnerable to such events (New Scientist, 5 April 2008, p 28 and p 32).
Whatever the cause, if the power was cut off to the banks of computers that now store much of humanity’s knowledge, and people stopped looking after them and the buildings housing them, and factories ceased to churn out new chips and drives, how long would all our knowledge survive? How much would the survivors of such a disaster be able to retrieve decades or centuries hence?
The article is a compelling read, and offers an intellectual exercise on how much of our “stuff” will survive such a castastrophe. Ironically, the logic is that the digital content with the most copies oin existence may win out. So, while scholarly works, theses, research and other important scientific data would be at risk, pop music may surive just fine.
Controversial as it may be to archivists of both the digital and analog realms, Google is often seen as the ubiquitous oracle of record by the general online population. Websites (including libraries) strive to be listed on its search engines; it runs the de facto video archive for the internet at large; it’s made searching patents and even scholarly material easier to access than their respective online custodians in many cases. Research institutions and businesses even now turn to it to handle their growing mounds of e-mail and electronic documents.
Yet, curators and librarians have been very skeptical of Google, its aims, and even its competency at truly being able to objectively preserve the massive digital content it aims to take on. The ongoing Google Books drama is one such aspect of this. But now, others outside of the world of preservationists are seriously calling Google’s competency and motives into question.
Wired.com has posted an article about what may well be an example of Google’s mismanaging of important archives. Usenet was probably the Internet’s earliest online reference, social network, and historical archive, consisting of terabytes of text articles, files and writings from internet users (and its predecessors) dating back to 1980. While most of the content can be quite mundane, some of these articles document milestones in online history or contain the writings of influential pioneers of the Internet. Google, through its aquisitions of various entities over the years, became the world’s de facto curator of this content, and according to Wired contributor Kevin Paulson,Google is failing in this role:
… visiting Google Groups is like touring ancient ruins. On the surface, it looks as clean and shiny as every other Google service, which makes its rotting interior all the more jarring — like visiting Disneyland and finding broken windows and graffiti on Main Street USA. Searching within a newsgroup, even one with thousands of posts, produces no results at all. Confining a search to a range of dates also fails silently, bulldozing the most obvious path to exploring an archive. Want to find Marc Andreessen’s historic March 14, 1993 announcement in alt.hypertext of the Mosaic web browser? “Your search – mosaic – did not match any documents.”
… visiting Google Groups is like touring ancient ruins.
On the surface, it looks as clean and shiny as every other Google service, which makes its rotting interior all the more jarring — like visiting Disneyland and finding broken windows and graffiti on Main Street USA.
Searching within a newsgroup, even one with thousands of posts, produces no results at all. Confining a search to a range of dates also fails silently, bulldozing the most obvious path to exploring an archive.
Want to find Marc Andreessen’s historic March 14, 1993 announcement in alt.hypertext of the Mosaic web browser? “Your search – mosaic – did not match any documents.”
Wired’s send-up pretty much hints towards the exact concern that many archivists have about Google’s aims: that they aren’t motivated to maintain assets they hold if those assets are no longer making them any significant money. With newer, more visually-appealing technologies largely supplanting the once-huge popularity of Usenet, its archive is not as hot a commodity as it used to be, and one can speculate that if it’s not generating enough ad revenue, Google isn’t going to care to maintain the archive or keep it functional.
So, what happens when certain subject matter in their scanned books archive becomes less-popular – and thus less visited – within the Internet’s incredibly short attention span?
Enter your email address to subscribe to this blog and receive notifications of new posts by email.
Email Address
Subscribe