HOWTO Monitor your hard disk(s) with smartmontools

From Gentoo Linux Wiki

Jump to: navigation, search
This article is part of the HOWTO series.
Installation Kernel & Hardware Networks Portage Software System X Server Gaming Non-x86 Emulators Misc

Contents

Introduction

Aim of this howto is to exploit SMART technology (nowadays every hard disk has got it) to check if it is ok or not. SMART-enabled hard disks are able to continuously monitor their own health and alert the user if any anomaly is detected, and most of them are also able to carry out specific tests for better analysis.

Warning: An important thing before going on: always backup your important data, regardless what SMART says! Even though SMART is very reliable, sometimes it may be wrong; also, hard disks often die in an unexpected way and even if SMART has told you something was wrong you may have not enough time to put your data in a safe place.

Installation Procedure

First of all make sure SMART is enabled in the BIOS. For example, in my BIOS I have this:

Code: BIOS
S.M.A.R.T. for Hard Disk: Enabled

Some BIOSes don't have the option, and report S.M.A.R.T. as disabled, but don't worry, smartctl can enable it (see below).

Now let's install the smartmontools package:

# emerge -av smartmontools

Finally, you have to check if your hard disk(s) support SMART:

# smartctl -i /dev/hda

For SATA drives:

# smartctl -i -d ata /dev/sda

To enable SMART on the drive:

# smartctl -s on /dev/sda

Using smartctl

SMART Health Status

Let's check the SMART Health Status:

# smartctl -H /dev/hda

If you read PASSED it's ok, but if you read FAILED you have to backup your data now: the disk has already failed or it's predicted to fail within 24 hours!

Smart Error Log

Now let's check the SMART Error Log (it's a list of errors detected by SMART during the disk's life):

# smartctl -l error /dev/hda

If we read No Errors Logged it's ok. If there are a few errors (and they are not so recent) you don't have to worry too much. If there are a lot of errors it's better if you backup your data as soon as you can.

Reading the SMART Health Status and the SMART Error Log is not enough: you really should do some other specific tests.

SMART Testing

These tests don't interfere with the normal functioning of the disk and they can be carried out when you want. I'll only describe here how to launch them and read their reports; if you want to learn more go here and/or read the man page.

First you should know which tests are supported by your drive:

# smartctl -c /dev/hda

In this way you can also know how much time each of them require.

Now let's execute the SMART Immediate Offline Test (if supported, of course):

# smartctl -t offline /dev/hda

You only have to wait (smartctl will show you how long). When it finishes, you should check the SMART Error Log again for the report.

Now let's carry out the SMART Short Self Test or the SMART Extended Self Test (again, only if they are supported by your drive). They are similar, but the second one is more accurate then the first:

# smartctl -t short /dev/hda
# smartctl -t long /dev/hda

Then check the SMART Self Test Error Log:

# smartctl -l selftest /dev/hda

Now let's execute the SMART Conveyance Self Test:

# smartctl -t conveyance /dev/hda

Then check the SMART Self Test Error Log again:

# smartctl -l selftest /dev/hda

Automatically monitor your drive(s)

If you want to automatically monitor your drive(s) you have to configure the smartd daemon and make it be launched at boot.

Here I'll show you how to:

  • monitor a single drive (/dev/hda)
  • schedule all tests (Offline, Extended and Conveyance tests) to be launched every Friday from 11:00 to 15:00, in succession
  • execute a script if any error is detected: this script will write a detailed report and then it will shut down the computer

Smartd daemon's configuration file is /etc/smartd.conf (if it doesn't exist you have to create it).


File: /etc/smartd.conf
...
#DEVICESCAN
...
/dev/hda \ 
-H \
-l error -l selftest \
-s (O/../../5/11|L/../../5/13|C/../../5/15) \
-m ThisIsNotUsed -M exec /usr/local/bin/smartd.sh

This is the content of the script:

File: /usr/local/bin/smartd.sh
#!/bin/bash
LOGFILE="/var/log/smartd.log"
echo -e "$(date)\n$SMARTD_MESSAGE\n" >> "$LOGFILE"
shutdown -h now

Obviously, make the script executable:

# chmod +x /usr/local/bin/smartd.sh

The previous one is only an example. Everyone is free to fit it according to his/her own configuration-related needs and preferences. If you want to learn more you can read the man page:

$ man smartd.conf

To test everything you should append -M test to smartd.conf's last line and launch the daemon (note that this will shut down your machine):

# /etc/init.d/smartd start

If something is wrong you can check /var/log/messages:

# tail /var/log/messages

Now remove -M test option and make smartd to be launched at boot:

# rc-update add smartd default

Finished!

Useful links

Original thread:

Others:

Personal tools