What the… ?
Last sunday I got hit by an unexpected hard drive failure (are hard drive failures ever expected?). Good thing is I had most of the data backed up. The worst part is, it was the system drive, so the PC was effectively shut down.
The drive is a Seagate Barracuda 7200.11, also known as ST3500320AS. I have been using the PC normally the day before, I have even managed to take some screenshots of the new V3 Capital Ship shaders on EVE test server. The disk was working as it has been for the past 4 years: smooth and silent. Unfortunately, the next day my gaming rig greeted me with
“NON SYSTEM DISK OR DISK ERROR
Not good. I have rebooted this time closely watching the BIOS messages:
SATA Port P0: Port reset error!
Okay, so the BIOS can’t access the disk. It must be the cabling! I have opened the case and carefully pushed all the plugs on all hard disks and dvd-rom drive.
No go. Still the same error.
Okay, let’s switch the cables. I have plugged the affected disk to a different port on the motherboard. Error message changed a little bit:
SATA Port P3: Port reset error!
Right, the disk has failed. Damn! Let’s see what uncle Google has to say about this. I entered the disk model number… and… surprise! It’s a known firmware bug.
What happened?
The disk I have is running firmware version SD15. As it turns out, Seagate had a “black series” of 7200.11 Barracudas, which had a firmware bug. The bug usually surfaced much earlier for other owners of the disk (one month up to a few months max). Mine worked for four years, but it eventually got hit by the bug as well.
It is worth noting that these Seagate drives store most of their internal calibration and configuration data on the platters, rather than in NVRAM, so replacing the PCB (which was my immediate idea) wouldn’t work. It seems that the bug is somehow related to this service data: when the drive is powered, it conducts some tests and then attempts to read the configuration information. And it hangs. Hence the bug is also known as “stuck in BSY” or simply BUSY bug.
But can it be fixed?
The answer is yes, it can be fixed. You will need a special serial console cable (Nokia CA-42 cable can be adapted for this purpose), which will allow running diagnostic commands on the drive itself. An external USB-to-SATA interface with own power supply will be handy as well. You also need a Torx T-6 size screwdriver, because you will need to separate the PCB from the drive for a while.
The detailed instruction is here: Fixing a Seagate 7200.11 drive.
As expected, the drive hangs with an error message shortly after it spins up. Of course the console is inaccesible.
LED:000000CC FAddr:0024A051
but after running all the commands in the solution I linked above, all is fine:
The disk has been put back into the PC and is working just like before, with all the data intact.
Do you still trust Seagate?
Yes I still do. I have many reasons to do that. First of all, I had many hard drives of different vendors, and they all break down roughly the same. Each of those vendors had some “black series” of drives which broke down more often than others. I also do realize that every hard drive will give up eventually. That’s why you can expect hard drives in server disk arrays to fail. That’s why they are so easy to replace. That’s what RAID disk arrays are for. Hard disks are a sort of long life consumables.
But the most important reason is a nearly 20 year old hard drive from my first PC. Guess what? It is still in working order. Although the capacity is orders of magnitude less than the current hard drives, and it is extremely slow and rather noisy, it still works. I need no further proofs that Seagate makes decent hard drives 😉
THANKS