I recently needed to rebuild a RAID1 array after a reboot for some odd reason and afterwards I was unable to assemble the array. mdadm came back and reported “Device or resource busy” on one of the drives. I couldn’t figure out what the issue was originally as it wasn’t mounted and no other processes were using the drive via lsof. Eventually I tracked it down to a changed UUID – my fstab was trying to mount the old mdadm array and it locked the resource. I checked it by doing ‘ls’ on /dev/disk/by-uuid/. Updating fstab with a new UUID, rebooting to clean up things and reassembling the array solved it. Just a useful item to keep in mind.
I was taking a mid-afternoon nap (yes at 3 am, I work nights) and I came back to my PC to see CPU usage on my server hovering around 15% – not at idle like usual. Doing a quick check revealed md0_raid5 and md0_resync running which is normally not a good sign.
mdadm –detail /dev/md0 showed the following:
Update Time : Sun Oct 5 03:22:19 2008 State : clean, recovering Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Rebuild Status : 85% complete
Uh oh. Why was the array rebuilding itself? All drives were listed as active and working … but did we experience a drive momentarily dropping from the array or a SATA device reset? Was this a sign of impending hardware failure? Tailing /var/log/messages displayed this useful piece of information:
Oct 5 01:06:01 rigel md: data-check of RAID array md0
Ok, so “data-check” doesn’t sound so worrysome. A quick Google search revealed this nice gem:
root@rigel:~# tail /etc/cron.d/mdadm # By default, run at 01:06 on every Sunday, but do nothing unless the day of # the month is less than or equal to 7. Thus, only run on the first Sunday of # each month. crontab(5) sucks, unfortunately, in this regard; therefore this # hack (see #380425). 6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet
Ah, so this is the first Sunday of the month and the check kicked off at 1:06 AM. You trixies Ubuntu. Apparently a bug has been filed causing performance issues on some boxes. Good idea to verify data integrity, although slightly more obvious notice would be nice.