Search Gear
 

Crash and Burn | Hard Drive Maintenance

January 1, 2009
share

What is the most mission critical and most fragile item in your studio? Not your prized high-end microphone — you could survive with-out that — and not the lead singer's ego. No, it's your hard drive, and losing it unexpectedly can be lethal to any project. A drive failure can bring your studio operations to a full stop — unless you're prepared.

So get ready for the inevitable. You'll need to know a bit about how hard drives work, what makes them fail, how to reduce their failure rate, and what to do when they finally quit.

Under the Hood

Hard drives are sophisticated, precision machines, and their designs have improved greatly over the years (see Fig. 1). I'll touch on a few aspects of contemporary drive architecture that will help you understand why they sometimes fail and how to avert or deal with the problem. (For more information, read the Wikipedia article at en.wikipedia.org/wiki/Hard_drive.)

A hard drive records data on from one to five flat disks called platters. The platters, held in place by a spindle, rotate at high speed and are driven by a motor. Most platters are made of glass or an aluminum alloy and are coated with two parallel layers of a magnetic cobalt-based alloy that are separated by a 3-atom-thick layer of nonmagnetic ruthenium. The two main layers are magnetized in opposite orientation, reinforcing each other.

FIG. 1: Basic parts of a typical hard drive include the case, platter, actuator arm, read/write head, and spindle.

FIG. 1: Basic parts of a typical hard drive include the case, platter, actuator arm, read/write head, and spindle.
Chuck Dahmer

The drive's microscopic read/write heads — floating nanometers above (but not touching) the rotating platter's surface — detect and alter the magnetic pattern that stores the data. One head, containing both the read and write elements, serves each platter surface. The read/write head is mounted on a light, rigid actuator arm that moves the head to the correct position above the platter. The arm is moved by a voice-coil actuator that, like a loudspeaker, utilizes a coil and a magnetic field. The air close to the platter moves at or close to the platter speed and acts like a bearing, preventing the head from touching the platter. Each platter's magnetic surface is logically divided into many small regions, each of which stores a binary unit of information.

A drive also contains a circuit board and firmware to operate all these parts and to encode and manage the data. Modern drives have an onboard RAM cache that buffers between the speedy computer and the slower hard disk. A sealed case with a filtered vent hole (to equalize air pressure) encloses the entire mechanism. There's much more to the internal workings of a drive, but these parts most directly affect our discussion.

Physical Failure

Hard drives can fail due to either physical problems or logic problems. Although you can't prevent most failures, if you discover a problem in time, you might be able to power down and have a chance to salvage your data (see the online bonus material “When Disaster Strikes” at emusician.com). Here are a few of the physical reasons why drives fail:

Head crash

Head crashes are a common cause of drive failure. They usually occur when the actuator arm swings too close to the surface of the platter and touches it, potentially damaging the platter and the read/write heads. In some cases, the arm just falls out of position and the heads and platter remain intact; in other cases, the platters collide or stick together. Either way, you almost certainly can't fix this yourself.

Heat damage

Heat is another leading cause of drive failure. Heat can burn out a circuit board and cause a platter to expand and damage the magnetic surface, thus altering the distance from the read/write heads to the platters. And that's just for starters!

Motor failure

If the drive motor fails, your drive either will spin at a degraded and unpredictable speed or won't spin at all. You can't prevent this, but you might hear the motor speed waver in time to be able to power down and try to minimize damage.

Other damaged components

A broken read/write arm, scratched platters, and bad drive bearings are deadly. All sorts of things can cause them to occur: impact damage, a head crash, heat, dust inside the case, a defective part, and so on. Baby your drives — don't bounce them around, pad them well if you transport them, and don't drop them.

Water or fire damage

These are obvious drive killers. Enough said.

Logic Failure

Hard drives can also fail because of logic problems, including the following:

Bad drive sectors

Bad sectors are less of a problem with modern drives than with older ones, but when they appear, you should be concerned. If the number of bad sectors has increased even slightly since the previous time you checked, it's time to replace the drive.

Disk fragmentation

When you save or modify a file, the operating system looks for sufficient free disk space to store the data. The same thing happens when the OS and applications overwrite old files and write new ones (this includes temporary files and updaters). If the OS can't find enough contiguous space, it has to BREAK the file into pieces — fragments — that can fit in the available free blocks of space. Over time, pieces of files and applications are increasingly scattered across the disk; this is called fragmentation (see Fig. 2).

When you delete a file, it isn't erased; it is deleted from the directory, allowing the computer to gradually overwrite those blocks. This can create fragments. And because large audio and video files require big blocks of free space, they become fragmented far more quickly than smaller files.

The directory, stored on the disk, keeps track of the fragments. When the computer opens a file, the directory tells the drive to find and reassemble the parts into a coherent file. This happens on the fly, constantly, at very high speed. The more fragmented a hard disk becomes, the harder the drive has to work to find the scattered pieces. This creates heat and stresses fragile, precision parts. Eventually it can cause the drive to fail. Therefore, defragmenting is an important aspect of drive maintenance.

True disk optimization is different from defragmentation. Optimization organizes related files and files that are commonly accessed together into logical groups for faster access. For instance, applications will launch faster if the files they require are located together, so the drive doesn't have to work hard to find them. Here I'm talking about organizing the files on the platters, not within folders or directories on your desktop. All of this takes place behind the scenes. (For information on defragmenting and optimizing software, see the online bonus material “Shattered!”)

Computer viruses

Viruses can wreak havoc on a drive.

Corrupt firmware or bad RAM

Even firmware can get screwed up over time. The RAM cache is usually SDRAM, and like any memory, it can be damaged. You can't do much to prevent this problem from happening.

Warning Signs

If you see any of the following signs of imminent drive failure, stop using the drive immediately:

Your computer takes a long time to boot or hangs completely

Slow boots can be caused by factors other than drive problems, such as a corrupt operating system or having your OS launch a lot of programs at startup. A somewhat sluggish boot disk might just need to be defragmented. But a very long boot time or a failure to boot generally indicates that a drive is encountering read/write failures. Minimizing the number of programs that automatically launch at startup will enable you to more easily notice slow boot times.

A disk utility reveals bad disk sectors

Disk utility software might be able to fix the bad sectors, but as noted earlier, if the problem reappears and increases, your drive is dying.

The drive is hot to the touch

All drives get warm, but if a drive is noticeably hot, it is working way too hard and is about to die.

The “click of death.”

Any odd sound made by your drive — such as clicking, knocking, whistling, or grinding — is a bad sign. A clicking sound (the so-called click of death) often indicates a read/write error during a seek. The sound is usually due to a mechanical failure that causes the head actuator to click as the drive attempts to recalibrate. If the drive hasn't crashed, it's about to.

Cyclic redundancy errors

Computers have an error-checking procedure to validate that a file has been copied correctly. A cyclic redundancy error indicates that the computer cannot make an accurate copy. This could indicate a bad disk sector or something worse: damaged read/write heads, a bad RAM cache, or dust that is damaging the platter.

Preventive Medicine

Because there is usually no cure for a crashed drive, keeping your data safe is all about prevention to the greatest extent possible and preparation for the inevitable. Here are some tips:

Save often and do backups faithfully

You never know when your drive will fail, so take no chances. Save open files whenever you have a moment. Back up at least daily or be prepared to lose your work. (For more about backup, see the online bonus material “Get Back” and the feature “Better Safe Than Sorry” in the May 2006 issue, available at emusician.com.)

Leave free space

As your drive fills up, the OS will have a harder time finding large contiguous blocks, so fragmentation is increased. In addition, defragmentation software needs free space to move file fragments, and optimization requires more free space.

Complete defragmentation with the Windows XP Disk Defragmenter requires that at least 15 percent of the drive be available, and that's a good, if generous, guideline. Some utility programs can defragment drives with less free space, but it's wiser not to push your drive to the limit. You can check available space in Windows by right-clicking on a drive and choosing Properties. For Mac OS X, use the Activity Monitor application (in the Applications folder) and choose Disk Usage.

Keep cool

You know those pictures of happy people using laptop computers in full sun on a hot beach? They won't be so happy when their internal drives fry. Keep drives away from heat sources and make sure that they're well ventilated. If needed, you can buy additional cooling fans (see Fig. 3). You can also monitor the temperature of your hard drive with SMART-savvy software (more on this in a moment).

Feed clean power

Power surges, spikes, and sags endanger your drive's health, so use a quality power conditioner that includes filtering and surge/spike protection. I recommend getting an uninterruptible power supply (UPS) like those in the APC (apcc.com) Back-UPS LS or RS series (see Fig. 4). A UPS gives you temporary power in the event of a power failure, and the Back-UPS LS and RS models feature automatic voltage regulation (AVR), which delivers 120 VAC regardless of the incoming voltage. (For more, see the article “Power Hitters” at home.comcast.net/~soppenheimer/sound/articles/features/power_hitters.html.)

Give it a rest

Some pros recommend keeping a hard drive running all the time because powering up and down is more stressful than continuous operation. Others say that if you're not using the drive, you should turn it off to avoid wear and tear. I generally prefer to leave my computer running when I'm home, but I power down when I'm leaving for several hours.

If you are going out for a few hours and want to leave your computer running, you can give your drives a chance to cool by letting them sleep. Windows users can enable Hard Disk Power Off under Power Management, or they can just hibernate the computer, which will power down the boot disk. Mac users can select System Preferences→Energy Saver, click on Show Details, and then check Put The Hard Disk(s) To Sleep When Possible. If, having returned from a short BREAK, you try to access your drive and get a spinning beach ball, make sure that you unchecked the sleep option.

Let it be

Moving or tilting a hard drive while powering up is dangerous to its health.

Perform regular maintenance

Make preventive maintenance a part of your weekly routine. Use disk utility software to detect and fix a variety of disk problems, such as corrupt file directories and bad sectors. Don't wait until your computer slows down and you suspect a problem.

Mac OS X comes with Disk Utility (found in the Applications folder), which enables you to verify and repair the disk and the disk permissions (see Fig. 5). Windows XP includes Chkdsk; to use it, choose Start→Run and then type chkdsk.exe. You can also use third-party Mac and Windows disk utilities, which often have more features (see the online bonus material “Your Utility Belt”).

Get SMART

SMART (Self-Monitoring Analysis and Reporting Technology) is built into many modern hard drives. It monitors more than 35 attributes of drive performance, including temperature, calibration, bad sectors, spin-up time, and the distance between the heads and the platter(s). You need SMART-savvy disk utility software (see Fig. 6) to access this information.

Defragment

I discussed defragmentation and optimization earlier, in the “Logic Failure” section.

Mac OS X: ensure that the OS does its maintenance routines

OS X automatically performs certain background maintenance tasks that can affect your boot drive. By default, these tasks are scheduled to run between 3:15 and 5:30 a.m., and if your computer is shut down or asleep, the maintenance can't be done. In that case, reschedule these tasks or run them manually using a third-party program such as Atomic Bird Macaroni 2.1.1 ($9.99; atomicbird.com/macaroni) or Brian R. Hill's MacJanitor 1.3 (free; personalpages.tds.net/~brian_hill/macjanitor.html). You can also run the maintenance tasks using the Terminal application.

Don't record audio projects to your boot disk

Because audio files are large and we edit them extensively, the drive where you store them can become fragmented relatively quickly. In addition, audio drives in studios work long and hard. If you record to a drive other than your boot drive, it will be easier to mind your audio drive's health, and your boot drive will last longer.

Go on a RAID

If you have a professional project studio and can afford the investment, consider using a RAID 1 disk array for critical audio file storage. (For more on RAID, see the online bonus material “Mirror, Mirror.”)

Tighten it up

If you have an internal drive that's acting a bit strange — say, it's constantly spinning — check to make sure that the drive is fully seated, all contacts are clean, and all connections are tight.

The Rest of the Story

There is much more to learn about hard-drive failure and data safety, so be sure to read the five useful sidebars in the online bonus material. This should give you enough information to begin a drive-maintenance program and to prepare for the evil day that you know is coming. I suggest you start now. Remember, back up first — because nothing in life is certain except death and hard-drive failure.


Former EM editor in chief Steve Oppenheimer bought another backup drive while writing this story.

Motor failure

If the drive motor fails, your drive either will spin at a degraded and unpredictable speed or won't spin at all. You can't prevent this, but you might hear the motor speed waver in time to be able to power down and try to minimize damage.

Other damaged components

A broken read/write arm, scratched platters, and bad drive bearings are deadly. All sorts of things can cause them to occur: impact damage, a head crash, heat, dust inside the case, a defective part, and so on. Baby your drives — don't bounce them around, pad them well if you transport them, and don't drop them.

Water or fire damage

These are obvious drive killers. Enough said.

Logic Failure

Hard drives can also fail because of logic problems, including the following:

Bad drive sectors

Bad sectors are less of a problem with modern drives than with older ones, but when they appear, you should be concerned. If the number of bad sectors has increased even slightly since the previous time you checked, it's time to replace the drive.

Disk fragmentation

FIG. 2: This screen shot from Coriolis Systems iDefrag 1.6.4 for Macintosh shows parts of files scattered about. This is called fragmentation.

FIG. 2: This screen shot from Coriolis Systems iDefrag 1.6.4 for Macintosh shows parts of files scattered about. This is called fragmentation.

When you save or modify a file, the operating system looks for sufficient free disk space to store the data. The same thing happens when the OS and applications overwrite old files and write new ones (this includes temporary files and updaters). If the OS can't find enough contiguous space, it has to BREAK the file into pieces — fragments — that can fit in the available free blocks of space. Over time, pieces of files and applications are increasingly scattered across the disk; this is called fragmentation (see Fig. 2).

When you delete a file, it isn't erased; it is deleted from the directory, allowing the computer to gradually overwrite those blocks. This can create fragments. And because large audio and video files require big blocks of free space, they become fragmented far more quickly than smaller files.

The directory, stored on the disk, keeps track of the fragments. When the computer opens a file, the directory tells the drive to find and reassemble the parts into a coherent file. This happens on the fly, constantly, at very high speed. The more fragmented a hard disk becomes, the harder the drive has to work to find the scattered pieces. This creates heat and stresses fragile, precision parts. Eventually it can cause the drive to fail. Therefore, defragmenting is an important aspect of drive maintenance.

True disk optimization is different from defragmentation. Optimization organizes related files and files that are commonly accessed together into logical groups for faster access. For instance, applications will launch faster if the files they require are located together, so the drive doesn't have to work hard to find them. Here I'm talking about organizing the files on the platters, not within folders or directories on your desktop. All of this takes place behind the scenes. (For information on defragmenting and optimizing software, see the online bonus material “Shattered!”)

Computer viruses

Viruses can wreak havoc on a drive.

Corrupt firmware or bad RAM

Even firmware can get screwed up over time. The RAM cache is usually SDRAM, and like any memory, it can be damaged. You can't do much to prevent this problem from happening.

Warning Signs

If you see any of the following signs of imminent drive failure, stop using the drive immediately:

Your computer takes a long time to boot or hangs completely

Slow boots can be caused by factors other than drive problems, such as a corrupt operating system or having your OS launch a lot of programs at startup. A somewhat sluggish boot disk might just need to be defragmented. But a very long boot time or a failure to boot generally indicates that a drive is encountering read/write failures. Minimizing the number of programs that automatically launch at startup will enable you to more easily notice slow boot times.

A disk utility reveals bad disk sectors

Disk utility software might be able to fix the bad sectors, but as noted earlier, if the problem reappears and increases, your drive is dying.

The drive is hot to the touch

All drives get warm, but if a drive is noticeably hot, it is working way too hard and is about to die.

The “click of death.”

Any odd sound made by your drive — such as clicking, knocking, whistling, or grinding — is a bad sign. A clicking sound (the so-called click of death) often indicates a read/write error during a seek. The sound is usually due to a mechanical failure that causes the head actuator to click as the drive attempts to recalibrate. If the drive hasn't crashed, it's about to.

Cyclic redundancy errors

Computers have an error-checking procedure to validate that a file has been copied correctly. A cyclic redundancy error indicates that the computer cannot make an accurate copy. This could indicate a bad disk sector or something worse: damaged read/write heads, a bad RAM cache, or dust that is damaging the platter.

Preventive Medicine

Because there is usually no cure for a crashed drive, keeping your data safe is all about prevention to the greatest extent possible and preparation for the inevitable. Here are some tips:

Save often and do backups faithfully

You never know when your drive will fail, so take no chances. Save open files whenever you have a moment. Back up at least daily or be prepared to lose your work. (For more about backup, see the online bonus material “Get Back” and the feature “Better Safe Than Sorry” in the May 2006 issue, available at emusician.com.)

Leave free space

As your drive fills up, the OS will have a harder time finding large contiguous blocks, so fragmentation is increased. In addition, defragmentation software needs free space to move file fragments, and optimization requires more free space.

Complete defragmentation with the Windows XP Disk Defragmenter requires that at least 15 percent of the drive be available, and that's a good, if generous, guideline. Some utility programs can defragment drives with less free space, but it's wiser not to push your drive to the limit. You can check available space in Windows by right-clicking on a drive and choosing Properties. For Mac OS X, use the Activity Monitor application (in the Applications folder) and choose Disk Usage.

Keep cool

You know those pictures of happy people using laptop computers in full sun on a hot beach? They won't be so happy when their internal drives fry. Keep drives away from heat sources and make sure that they're well ventilated. If needed, you can buy additional cooling fans (see Fig. 3). You can also monitor the temperature of your hard drive with SMART-savvy software (more on this in a moment).

Feed clean power

FIG. 4: The APC Back-UPS LS 700 is a quality uninterruptible power supply with surge, spike, and sag protection; automatic voltage regulation; and an assortment of bells and whistles.

FIG. 4: The APC Back-UPS LS 700 is a quality uninterruptible power supply with surge, spike, and sag protection; automatic voltage regulation; and an assortment of bells and whistles.

Power surges, spikes, and sags endanger your drive's health, so use a quality power conditioner that includes filtering and surge/spike protection. I recommend getting an uninterruptible power supply (UPS) like those in the APC (apcc.com) Back-UPS LS or RS series (see Fig. 4). A UPS gives you temporary power in the event of a power failure, and the Back-UPS LS and RS models feature automatic voltage regulation (AVR), which delivers 120 VAC regardless of the incoming voltage. (For more, see the article “Power Hitters” at home.comcast.net/~soppenheimer/sound/articles/features/power_hitters.html.)

Give it a rest

Some pros recommend keeping a hard drive running all the time because powering up and down is more stressful than continuous operation. Others say that if you're not using the drive, you should turn it off to avoid wear and tear. I generally prefer to leave my computer running when I'm home, but I power down when I'm leaving for several hours.

If you are going out for a few hours and want to leave your computer running, you can give your drives a chance to cool by letting them sleep. Windows users can enable Hard Disk Power Off under Power Management, or they can just hibernate the computer, which will power down the boot disk. Mac users can select System Preferences→Energy Saver, click on Show Details, and then check Put The Hard Disk(s) To Sleep When Possible. If, having returned from a short BREAK, you try to access your drive and get a spinning beach ball, make sure that you unchecked the sleep option.

FIG. 5: Apple''s Disk Utility comes free with Mac OS X and offers a number of important features, including verification and repair of the disk and the disk permissions.

FIG. 5: Apple''s Disk Utility comes free with Mac OS X and offers a number of important features, including verification and repair of the disk and the disk permissions.

Let it be

Moving or tilting a hard drive while powering up is dangerous to its health.

Perform regular maintenance

Make preventive maintenance a part of your weekly routine. Use disk utility software to detect and fix a variety of disk problems, such as corrupt file directories and bad sectors. Don't wait until your computer slows down and you suspect a problem.

Mac OS X comes with Disk Utility (found in the Applications folder), which enables you to verify and repair the disk and the disk permissions (see Fig. 5). Windows XP includes Chkdsk; to use it, choose Start→Run and then type chkdsk.exe. You can also use third-party Mac and Windows disk utilities, which often have more features (see the online bonus material “Your Utility Belt”).

FIG. 6: Utilities such as Ariolic Software ActiveSMART 2.51 can use SMART technology to report on the condition of a hard drive. Here is a graph representing a drive''s temperature history.

FIG. 6: Utilities such as Ariolic Software ActiveSMART 2.51 can use SMART technology to report on the condition of a hard drive. Here is a graph representing a drive''s temperature history.

Get SMART

SMART (Self-Monitoring Analysis and Reporting Technology) is built into many modern hard drives. It monitors more than 35 attributes of drive performance, including temperature, calibration, bad sectors, spin-up time, and the distance between the heads and the platter(s). You need SMART-savvy disk utility software (see Fig. 6) to access this information.

Defragment

I discussed defragmentation and optimization earlier, in the “Logic Failure” section.

Mac OS X: ensure that the OS does its maintenance routines

OS X automatically performs certain background maintenance tasks that can affect your boot drive. By default, these tasks are scheduled to run between 3:15 and 5:30 a.m., and if your computer is shut down or asleep, the maintenance can't be done. In that case, reschedule these tasks or run them manually using a third-party program such as Atomic Bird Macaroni 2.1.1 ($9.99; atomicbird.com/macaroni) or Brian R. Hill's MacJanitor 1.3 (free; personalpages.tds.net/~brian_hill/macjanitor.html). You can also run the maintenance tasks using the Terminal application.

Don't record audio projects to your boot disk

Because audio files are large and we edit them extensively, the drive where you store them can become fragmented relatively quickly. In addition, audio drives in studios work long and hard. If you record to a drive other than your boot drive, it will be easier to mind your audio drive's health, and your boot drive will last longer.

Go on a RAID

If you have a professional project studio and can afford the investment, consider using a RAID 1 disk array for critical audio file storage. (For more on RAID, see the online bonus material “Mirror, Mirror.”)

Tighten it up

If you have an internal drive that's acting a bit strange — say, it's constantly spinning — check to make sure that the drive is fully seated, all contacts are clean, and all connections are tight.

The Rest of the Story

There is much more to learn about hard-drive failure and data safety, so be sure to read the five useful sidebars in the online bonus material. This should give you enough information to begin a drive-maintenance program and to prepare for the evil day that you know is coming. I suggest you start now. Remember, back up first — because nothing in life is certain except death and hard-drive failure.


Former EM editor in chief Steve Oppenheimer bought another backup drive while writing this story.

Show Comments

These are my comments.

Featured

Reader Poll

Do you spend more time producing or playing?


See results without voting »