Ugh! I finished work one night and it was time to do my nightly Time Machine backup. I started it up and walked away. When I returned a half hour later, I saw the bad news.
The Mac OS had detected “SMART” errors on my hard disk. It told me in no uncertain terms that I must save the contents of the failed drive to a backup and replace it! It gave me no other options. Moreover, it had helpfully disabled Time Machine, so I could’t use it to do the backup!
The story has a happy ending, but the road to recovery is what is interesting. It required three hard disk utility programs on two platforms. The winning combo was:
- Super Duper (Macintosh)
- DiskWarrior (Macintosh)
- Spinrite (DOS; Wayback Machine required)
Here’s the story...
Except for that dire message, there were no other symptoms of failure. No crashes, no hangups. No corrupt files. Certainly no kernel panics. In every way, the behavior was normal.
I thought, sure, it can’t be that serious. I’ll run Disk Utility and fix the problem.
But NOOOOOO! the drive now appeared in the Disk Utility window in red (instead of black) and it gave the same dire message.
This was getting more serious. I quickly hooked up a spare drive and ran Super Duper. Luckily, it was not bothered by whatever had bothered Time Machine and finished the clone backup by the next morning.
Now that I was safe and had programmed Super Duper to do regular backups, I gingerly kept using the “failed” HD, but now watched carefully for symptoms. Meanwhile, I went over to Fry’s and got a 2TB replacement disk (Seagate 2TB, five-year warranty and in their “LP” series).
Since there was a safe backup, I continued working normally, keeping one eye out for symptoms. Strangely — it never happened. Completely normal.
Since the Super Duper clone had been created, I booted off the clone and tried running DiskWarrior to attempt to repair the internal. I felt sure that DiskWarrior could cure the problem. It ran for a couple of hours, far longer than the normal 15-30 minutes for a drive of this size. Finally, it finished successfully.
So I rebooted off the internal HD, confident that I would be able to run Time Machine on it again.
NO! Even a successful DiskWarrior session did not fix it.
Still, I got the admonition that the drive MUST be replaced.
I got the hint. When I got the time, I took it over to the repair shop (Mac Pro) and had them swap out the internal for my 2TB replacement.
[When I get a new HD, my SOP is to first hook it up to an old PC and condition it using Spinrite. This is before the HD is even formatted on the target OS. If it’s DOA, this shows up right away and its back to Fry’s for a replacement. If successful, it verifies every last sector and deals with any correctable problems it finds. So, this conditioning run was completed in advance. I had already invested a day on the test bench verifying the disk, so it was important to me that they install that drive.]
When I got the iMac home, my very first move was to hook the dead drive up to Spinrite to find out what the SMART system had been squawking about. As expected, the first thing that happened was that Spinrite threw up a scary error screen that said that the HD was in imminent danger of failure (I already knew that!). It cautioned me that, if I proceeded, it may be the last time I could ever read the disk again! Pretty serious stuff.
Since everything was already backed up, I went for it. I put it on Spinrite’s comprehensive mode (creatively called “Level 4”).
After it got going, I browsed over to the SMART screen to finally get some quantitative information. Everything actually looked normal, with the major exception of the “Reallocated Sectors” line. This line is a count that indicates the number of bad sectors that have been swapped out and replaced by spare sectors (I think).
In this case, the count was -35/-35!!! I’ve never seen or even heard of a count going negative. I don't even understand how a count can go negative. What the hell is “-35 sectors” supposed to mean? Is it borrowing some sectors that have not be spared in a desperate attempt to get going? I don’t know.
The next day (!), I came back and found that the Spinrite run had actually completed successfully. Moreover, it didn’t even find new damaged sectors of its own or or even any partially recoverable sectors. Except for that alarming Reallocated Sectors count, it actually looked just like a normal Spinrite run!
[By now, I had put the iMac back into service and successfully cloned the contents of the Super Duper drive back onto the fresh internal. Problem solved.]
I then hooked up my freshly Spinwritten HD to the iMac, this time as an external. Disk Utility still didn’t like it, so I reran DiskWarrior from my new internal.
Normally DiskWarrior can complete a run in under an hour (with either success or failure). This one set an all-time record for me. I started it at about 3:30 in the afternoon and it finally finished at about 9:30 the following morning!! I was amazed that it had made it at all. Among other things, the final DiskWarrior report said:
- 458,940 files had a duplicate ID that was repaired.
- 866,497 files had an ID that was repaired.
- 994 folders could not be found.
- 607 folders will have fewer items.
No wonder it took most of a day. I launched Disk Utility again to see what it thought. It now listed the drive in black instead of blood red. Hooray!
I let Disk Utility have at it. After the day in the Spinrite Emergency Room and most of a second day at the hands of DiskWarrior, it was given a clean bill of health! And after only a couple of minutes. GREEN!!!! The crucial point is that DiskWarrior could not successfully repair the disk without the Wayback Machine visit. This is not the first time that I needed to do a Spinrite run on a Mac disk in order to get the disk into good enough shape so that DiskWarrior could rebuild the directory.
Well, that scary error message that the iMac gave me was correct all along. Yep, SMART had detected some serious problems and I needed to swap it out. That was cool. What’s not so cool is that I don’t know of any software within the little Mac universe that’s capable of displaying quantitative information about the hard disk’s health. The Mac OS doesn’t say a word until you get right to the failure point. I some cases, that's already too late. In a nut, that’s the issue.
In this case, it’s likely that the failure was gradual. Any drive’s firmware will invisibly swap out damaged sectors in favor of spared out sectors in order to keep functioning normally. The idea is to not bother the end-user with errors that can be corrected. The problem comes when it runs out of its pool of spared out sectors and it cannot replace a new bad sector with a good one.
I think I could have been alerted if there had been some way of reading the hard disk’s status earlier. Certainly I’d prefer to check it out every few months, but these sealed Macs prohibit that, as far as I know. Some combo of Mac OS system software should do that (e.g., Disk Utility, I'm looking at you).
If I could have seen a reallocated sectors report a few months before the failure point, it would have alerted me that trouble was heading my way. Luckily, I could still read the disk when it the Mac OS threw up the alert.
The standard rule-of-thumb recommendation is to replace the hard disk every few years even if there are no overt signs of failure. But it’s partly because we really are “flying blind” when we are unable to monitor the drive’s health over its useful life. So, my bottom line is to follow that advice. But keep copies of Super Duper, Spinrite, and DiskWarrior handy.