Monitoring drive health with SMART

So after installing Intrepid on my new new Acer Aspire One netbook, I was working with SMART to resolving the Load_Cycle_Count issue (hard drive killer bug).  Once that was done I figured it would make sense to check the SMART health status of the rest of the drives in my desktops and server.  Good thing I did too…  results varied to a few counts of UDMA CRC Error Count to having hundreds of thousands of Hardware ECC Recovered, Raw Read Error Rate, and Seek Error Rates.  The RAID drives in my server had a few million counts of Seek Error Rate as well.  However I did run self-tests and all came back passed so it is something I am going to keep a close eye on (GkrellM plugin perhaps?)

According to a few different sources there are a few of the attributes which are important to watch, although opinions can vary slightly.  You can also get descriptions of each attribute here.  Also keep in mind I’m taking RAW_VALUE here, this Linux Journal article helps explain the difference between VALUE/WORST/THRESH/RAW values, however the data can be a little difficult to interpret (I usually stick with RAW).  Basically the best idea here is to keep current backups (a whole book can be written about those alone) and to schedule regular SMART self-tests for all your drives.  I used the following configuration in /etc/smartd.conf:

/dev/sda -a -o on -S on -s (S/../.././12|L/../../3/1) -m root
/dev/sdb -a -o on -S on -s (S/../.././12|L/../../3/3) -m root

This will configure regular ‘short’ scans at noon daily, and ‘long’ scans at 1 and 3 pm on Wednesdays.  Other parameters provide more verbose and frequent information (-a) and enabling autosave (-S) and automatic offline (-o).  -m simply tells to mail any warnings (if found) to root.  -M will do the same but send all reports instead of just warnings.

Update (Dec 05): Note – this is important.  Depending upon your distribution, you may need to modify your /etc/default/smartmontools file.  On my Intrepid boxes I needed to uncomment the line “start_smartd=yes”.  Without it, smartd would not start up automatically and furthermore any attempts to start it would silently fall back to the prompt without any error message.  I was wondering why my automated selftests were not being run…

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , , ,

Leave a Reply


Easy AdSense by Unreal