mdadm: Device or resource busy error

I recently needed to rebuild a RAID1 array after a reboot for some odd reason and afterwards I was unable to assemble the array.  mdadm came back and reported “Device or resource busy” on one of the drives.  I couldn’t figure out what the issue was originally as it wasn’t mounted and no other processes were using the drive via lsof.  Eventually I tracked it down to a changed UUID – my fstab was trying to mount the old mdadm array and it locked the resource.  I checked it by doing ‘ls’ on /dev/disk/by-uuid/.  Updating fstab with a new UUID, rebooting to clean up things and reassembling the array solved it.  Just a useful item to keep in mind.

Random md-device recovery?

I was taking a mid-afternoon nap (yes at 3 am, I work nights) and I came back to my PC to see CPU usage on my server hovering around 15% – not at idle like usual.  Doing a quick check revealed md0_raid5 and md0_resync running which is normally not a good sign.

mdadm –detail /dev/md0 showed the following:

    Update Time : Sun Oct  5 03:22:19 2008
          State : clean, recovering
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
         Layout : left-symmetric
     Chunk Size : 64K
 Rebuild Status : 85% complete

Uh oh.  Why was the array rebuilding itself?  All drives were listed as active and working …  but did we experience a drive momentarily dropping from the array or a SATA device reset?  Was this a sign of impending hardware failure?  Tailing /var/log/messages displayed this useful piece of information:

Oct  5 01:06:01 rigel md: data-check of RAID array md0

Ok, so “data-check” doesn’t sound so worrysome.  A quick Google search revealed this nice gem:

root@rigel:~# tail /etc/cron.d/mdadm
# By default, run at 01:06 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet

Ah, so this is the first Sunday of the month and the check kicked off at 1:06 AM.  You trixies Ubuntu.  Apparently a bug has been filed causing performance issues on some boxes.  Good idea to verify data integrity, although slightly more obvious notice would be nice.