HOWTO Monitor your hard disk(s) with smartmontools
From Gentoo Linux Wiki
This article is part of the HOWTO series.
|
Installation • Kernel & Hardware • Networks • Portage • Software • System • X Server • Gaming • Non-x86 • Emulators • Misc |
Contents |
Introduction
Aim of this howto is to exploit SMART technology (nowadays every hard disk has got it) to check if it is ok or not. SMART-enabled hard disks are able to continuously monitor their own health and alert the user if any anomaly is detected, and most of them are also able to carry out specific tests for better analysis.
Warning: An important thing before going on: always backup your important data, regardless what SMART says! Even though SMART is very reliable, sometimes it may be wrong; also, hard disks often die in an unexpected way and even if SMART has told you something was wrong you may have not enough time to put your data in a safe place. |
Installation Procedure
First of all make sure SMART is enabled in the BIOS. For example, in my BIOS I have this:
Code: BIOS |
S.M.A.R.T. for Hard Disk: Enabled |
Some BIOSes don't have the option, and report S.M.A.R.T. as disabled, but don't worry, smartctl can enable it (see below).
Now let's install the smartmontools package:
# emerge -av smartmontools
Finally, you have to check if your hard disk(s) support SMART:
# smartctl -i /dev/hda
For SATA drives:
# smartctl -i -d ata /dev/sda
To enable SMART on the drive:
# smartctl -s on /dev/sda
Using smartctl
SMART Health Status
Let's check the SMART Health Status:
# smartctl -H /dev/hda
If you read PASSED it's ok, but if you read FAILED you have to backup your data now: the disk has already failed or it's predicted to fail within 24 hours!
Smart Error Log
Now let's check the SMART Error Log (it's a list of errors detected by SMART during the disk's life):
# smartctl -l error /dev/hda
If we read No Errors Logged it's ok. If there are a few errors (and they are not so recent) you don't have to worry too much. If there are a lot of errors it's better if you backup your data as soon as you can.
Reading the SMART Health Status and the SMART Error Log is not enough: you really should do some other specific tests.
SMART Testing
These tests don't interfere with the normal functioning of the disk and they can be carried out when you want. I'll only describe here how to launch them and read their reports; if you want to learn more go here and/or read the man page.
First you should know which tests are supported by your drive:
# smartctl -c /dev/hda
In this way you can also know how much time each of them require.
Now let's execute the SMART Immediate Offline Test (if supported, of course):
# smartctl -t offline /dev/hda
You only have to wait (smartctl will show you how long). When it finishes, you should check the SMART Error Log again for the report.
Now let's carry out the SMART Short Self Test or the SMART Extended Self Test (again, only if they are supported by your drive). They are similar, but the second one is more accurate then the first:
# smartctl -t short /dev/hda # smartctl -t long /dev/hda
Then check the SMART Self Test Error Log:
# smartctl -l selftest /dev/hda
Now let's execute the SMART Conveyance Self Test:
# smartctl -t conveyance /dev/hda
Then check the SMART Self Test Error Log again:
# smartctl -l selftest /dev/hda
Automatically monitor your drive(s)
If you want to automatically monitor your drive(s) you have to configure the smartd daemon and make it be launched at boot.
Here I'll show you how to:
- monitor a single drive (/dev/hda)
- schedule all tests (Offline, Extended and Conveyance tests) to be launched every Friday from 11:00 to 15:00, in succession
- execute a script if any error is detected: this script will write a detailed report and then it will shut down the computer
Smartd daemon's configuration file is /etc/smartd.conf (if it doesn't exist you have to create it).
File: /etc/smartd.conf |
... #DEVICESCAN ... /dev/hda \ -H \ -l error -l selftest \ -s (O/../../5/11|L/../../5/13|C/../../5/15) \ -m ThisIsNotUsed -M exec /usr/local/bin/smartd.sh |
This is the content of the script:
File: /usr/local/bin/smartd.sh |
#!/bin/bash LOGFILE="/var/log/smartd.log" echo -e "$(date)\n$SMARTD_MESSAGE\n" >> "$LOGFILE" shutdown -h now |
Obviously, make the script executable:
# chmod +x /usr/local/bin/smartd.sh
The previous one is only an example. Everyone is free to fit it according to his/her own configuration-related needs and preferences. If you want to learn more you can read the man page:
$ man smartd.conf
To test everything you should append -M test to smartd.conf's last line and launch the daemon (note that this will shut down your machine):
# /etc/init.d/smartd start
If something is wrong you can check /var/log/messages:
# tail /var/log/messages
Now remove -M test option and make smartd to be launched at boot:
# rc-update add smartd default
Finished!
Useful links
Original thread:
- [HOWTO] monitorare l'hard disk con SMART - UPDATE! (in Italian)
Others: