The Archive Problem

December 29, 2011 - What's the solution to keeping data (like photographs) across decades?

My cousin had a very simple solution to the problem of memory cards for her digital cameras filling up - she bought new ones. At the time I thought she was crazy, but she just might have been ahead of her time.

At the time I found out about this, memory cards were about 40 times more expensive than hard drives. These days they are 10 times as expensive. Although I don't expect these devices to have parity any time soon, that price gap will definitely close some more.

This might not be a good thing. The real issue for memory is its longevity, its ability to act as an archival medium. Film negatives, photographs, and paper have very long lives even when treated badly; when stored carefully, all can last centuries. Digital media, on the other hand, has lifetimes measured in years. I don't think this is very well understood in the consumer market.

It is understood in the high-end business market. For example, EMC makes very large storage devices that are made up of ordinary hard drives. These systems are expensive because the drives are wrapped with sophisticated electronics and software designed specifically to protect against data loss. These devices can have thousands of hard drives, and hard drives fail. But when a drive does fail, the system itself notifies EMC, which dispatches a replacement drive for arrival the next day. The failed drive is removed, the new drive installed, and the system rebuilds the data on that drive, in real time. The systems and applications dependent upon that data never know there was a problem.

We consumers can't afford that kind of solution. Just the electric bill would kill us.

Large business also use tape for archival storage. Magnetic tape has a very long lifetime. For example, I have some reel-to-reel tapes that my aunt made in the late '50s. They are still playable. Unfortunately, tape systems are expensive and all but gone in the consumer market.

What is the solution for archival storage? Let's examine the viable alternatives - optical discs, hard drives, flash memory cards, and the “cloud.”

Optical Discs (CD, DVD, Blu-Ray)

Optical discs are cheap and easy to burn, with adequate software built into operating systems for the past decade. The discs themselves have no electronics and no moving parts. The cost per gigabyte (GB) for the most expensive discs (Blu-Ray) is about 10¢ when the discs are bought in quantity; CDs and DVDs are even cheaper. Optical discs seem very durable but the US Department of Defense published a standard saying that all optical media had to be copied every 5 years, meaning that DOD thinks there is a risk of data loss after 5 years. I have recordable optical media that is older than that, but I have experienced failures in some of those. I now make duplicates of my most important data every 5 years.

A recent development over the past few years is the emergence of “archival quality” discs from manufacturers like JVC Taiyo Yuden and Verbatim. These discs can cost as much as $1/GB but they have claimed life of up to 70 years. I have no way to know this is true (in theory the discs will outlive us all). However, the issue in disc life is the type of dye used in the manufacture of the media. The Japanese developed several dyes that do not fade over time like the original dyes did. Manufacturers also found a way to use gold, highly desirable because it does not oxidize (rust) like other metals. There are now some gold media discs on the market.

The only problem with optical media is capacity. Standard DVDs hold 4.7GB, so one disc won't hold the contents of a full 8GB memory card. Meanwhile, memory cards are getting bigger all the time, especially now that just about all consumer video cameras use memory instead of tape.

Hard Drives

Disks with rotating, magnetic media are electrical and mechanical devices, which means they can fail for reasons other than the medium becoming unreliable. My own practice is to recommend replacement of hard drives after 5 years. This sounds like a big hassle, but drives keep getting cheaper. Most of the world’s data is stored on hard drives. The only problem for consumers like us is that one drive isn’t enough – you need two to make sure that a physical failure of one doesn’t cause loss of the data. Redundancy is essential. The good news here is that the drives are very big, up to 2TB (terabytes) in consumer devices and very cheap at about 14¢/GB.

Hard drives are happiest when spinning. Setting a drive on a shelf for 5 years without powering it up is a recipe for failure. If depending upon hard drives for archival storage, they should be powered up and checked regularly, every few months.

Flash Memory

Except for expense, flash memory might be a good choice for archival storage. But there is a dirty little secret to flash that most consumers don't know - it has a definite lifetime.

There are two factors. One is how long the overall card lasts. There is a charge on the cards that dissipates over time; once the charge is gone, the card stops working and its data is lost. The estimated lifetime for a memory card is currently about 10 years. The second factor is write cycles. A given flash cell can be written for some maximum number of times, after which the cell can be read but not written. This number used to be 10,000 cycles but is now in excess of one million, so for a card intended for archival storage (i.e., read only) it should not be a problem.

You’d probably have to replace the cards every 8 years just to give yourself a buffer against the 10-year death point. Flash will probably get cheaper (especially after Thailand recovers from the floods).

Recently, SanDisk announced a new product called Memory Vault, a flash memory product that claims to hold its data for "up to" 100 years. This claim is obviously based on math rather than empirical data, a point SanDisk makes clearly on its site. There are other caveats; read carefully. The downfall of the solution is capacity, currently 8 or 16GB at a cost of over $5/GB, almost as expensive as DDR3 RAM. What's important here is not the viability of these particular products but the fact that SanDisk recognizes the problem and has leapt into the fray, establishing its position in this market segment. This is nothing short of amazing because the company is acknowledging the more limited lifetime of all its other flash products, the bulk of its product line.

Cloud

The newest storage solution on the block is the “cloud.” All this means is that your data is stored somewhere on the Internet and can be accessed by any device you own with an Internet connection. It is up to the company providing the service to assure the life of your data, although nobody can give you an absolute guarantee (their equipment can fail, too).

The issue with the cloud is expense and longevity – you are relying on the corporation outlasting you and staying in that business. We usually only find such generational longevity in financial institutions, where heavy regulation supposedly assures continuity, and government, which remembers everything. Meanwhile, you will have to maintain monthly payments for the service (despite Apple’s iCloud, it can’t be done for free).

Bottom Line

Someday you want your kids to have your photos, videos, and documents so they can carry on the family history. For the past few generations we have all relied on long-lived paper and film as the transmission medium. How we decide to transmit our considerable digital assets is truly a difficult question. But the one thing you can know for sure is that the lifetime of digital media is much shorter, with attendant higher risk, than paper and film.

My Recommendations

If money is no object, flash memory cards are a good solution. They simply have to be replaced every 8 years or so with new cards and the data copied from new to old. Tip: write the date of purchase on each card with a Sharpie so you can tell their age.

For the economy-minded, copying all data to a couple of 2TB hard drives (2 for redundancy) provides room for at least 400,000 photos, 153 hours of uncompressed video, or 500 hours of MP4 video. You'll have to spin those drives regularly to make sure they are still working, something you'll be doing if you are regularly archiving new data to them. The drives should probably be replaced every 5 or so years, but that is still cheap.

Even cheaper is optical media. However, it's a hassle. Even a 25GB Blu-Ray disc is a mere 1.25% of the capacity of a 2TB hard drive. This means you will spend a lot of time organizing the data and keeping an index of the discs so you know what is where. Then you'll have to make new copies every 5 years to guard against disc failure. It is probably a good idea to make two copies of each disc at the outset, perhaps with different brands of media.

In the middle price range is archival-quality optical storage. It has the same problems as regular discs with respect to the hassle factor, but at least the discs will last longer than regular media, perhaps longer than you. Even so, I'd make two copies of everything.

I don't recommend the cloud. You are putting your valuable digital assets in someone else's hands with absolutely no assurances of longevity. And if you for any reason can't pay the bill at some point, poof.

I'm sorry I can't make a specific recommendation. I'd love to be able to tell you that I've been keeping my data on archival-quality DVDs for the last 50 years and that it really works, but nobody can make that claim, yet. You'll simply have to choose the solution that best fits your finances, organizational skills, and commitment to discipline.

Tags: Backup, Hardware

A total of 12 related articles were found. See them all...