Skip to main content

a plea for non-identical mirrors

When I set up my test environment earlier this year, I put two Samsung HD103UJ 1TB drives in the main file server, and configured them as a RAID-1 software mirror through mdadm. Just months later, smartd was spewing out messages about both drives: Device: /dev/sda, 8 Currently unreadable (pending) sectors.

That got me very worried... I replaced one of the drives with a WDC WD1000FYPS-01ZKB0, and rebuilt the mirror. Actually, I built new mdadm mirrors from smaller partitions to lessen the risk of a two-drive badblock scenario. Regardless, my data got moved, so now everything gets stored on two different drives. Different makes, manufacturers, models. Same size though.

The problem is, I can't RMA the drive. It's still in warranty, but when running badblocks -w on the drive, it shows no problems whatsoever. Moreover, the SMART status has gone back to: Current_Pending_Sector 0. So not only is the problem undetectable by badblocks, it even makes it go away in the SMART counters where it was visible in the first place.
First I thought that the problematic sectors got reallocated, but the SMART info suggests otherwise: Reallocated_Event_Count equals 0. If you want to look at your own drives, grep for _Pending_ and Reallocated_ in the smartctl -a output.

My conclusion: it was a very, very bad idea to make a mirror out of two identical drives (and aggravating the fact, they were the cheapest ones, too). Please remember: pick different drives to make up a mirror. A mirror image needs to be identical, but having it on identical media is a bad, bad thing.

And to round up, my final thoughts: how bad is a non-zero "Current_Pending_Sector" count, really ? And why can it go down without reallocation ?

Side note: I'm not saying Samsung drives are bad, I have 3 Samsung HD300LJ in a RAID-1 setup that are doing fine after more than 2 years of 24x7 service. But buying their 1TB drives left a sour aftertaste...

Comments

Popular posts from this blog

Volkswagen UHV bluetooth touch adapter & its problems

My Volkswagen car has the "universal cellphone preparation" UHV built-in. This is the main part of a car kit, but requires an additional adapter for connecting to a cellphone. At first, I was using an adapter for my good old Nokia 6310, even after I changed to the Nokia E71. Connecting was easy: pair the phone with the "VW UHV" bluetooth entity, and done. This has the phone connected to the car kit at all times, so even non-call-related functions use the car audio system (e.g. voice recognition). But progress will have its way, no matter what happens. So in comes the "bluetooth touch adapter". Instead of a phone-specific adapter, this is a small touchscreen device that slots into the UHV dashboard mount. Connecting a phone is very different now: the Bluetooth Touch Adapter connects to the "VW UHV" device via bluetooth the phone connects to "Touch Adapter" device, also via bluetooth The device doesn't allow step 2 if step 1 didn'

Reset lost root password on vSphere ESXi 6.7

VMware's solution to a lost or forgotten root password for ESXi is simple: go to  https://kb.vmware.com/s/article/1317898?lang=en_US  and you'll find that "Reinstalling the ESXi host is the only supported way to reset a password on ESXi". If your host is still connected to vCenter, you may be able to use Host Profiles to reset the root password, or alternatively you can join ESXi in Active Directory via vCenter, and log in with a user in the "ESX Admins" AD group. If your host is no longer connected to vCenter, those options are closed. Can you avoid reinstallation? Fortunately, you can. You will need to reset and reboot your ESXi though. If you're ready for an unsupported deep dive into the bowels of ESXi, follow these steps: Create a bootable Linux USB-drive (or something else you can boot your server with). I used a CentOS 7 installation USB-drive that I could use to boot into rescue mode. Reset your ESXi and boot from the Linux medium. Ident

GEM WS2 MIDI System Exclusive structure and checksums

MIDI is the standard for communication between electronic music instruments like keyboards and synthesizers. And computers! While tinkering with an old floppy-less GEM WS2 keyboard, I wanted to figure out the structure of their System Exclusive memory dumps. SysEx is the vendor-specific (and non-standard) part of MIDI. Vendors can use it for real-time instructions (changing a sound parameter in real-time) and for non-real-time instructions (sending or loading a configuration, sample set, etc.). In the GEM WS2, there's two ways of saving the memory (voices, globals, styles and songs): in .ALL files on floppy, and via MIDI SysEx. The .ALL files are binary files, 60415 bytes long. The only recognizable parts are the ASCII encoded voice and global names. The SysEx dumps are 73691 bytes long. As always in MIDI, only command start (and end) bytes have MSB 1, and all data bytes have MSB 0. The data is spread out over 576 SysEx packets, preceded by one SysEx packet with header informat