Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts

Use smartd Smartmontools to prevent data loss

Posted by plattapuss on July 10th, 2008

Are you responsibly for one or more servers. Perhaps you have a computer at home that you worry about at night, "What happens if my hard drive fails?" If this is you, then you need SmartMonTools. Actually, it comes pre-installed on most flavours of Linux these days, but amazingly enough, it is not set to run automatically.

SmartMonTools will monitor your Self Monitoring And Reporting Technology (S.M.A.R.T.) enable hard drives for potential problems which can occur before a hard drive completely files. If properly setup, it will warn you of these potential issues and possibly save your data. Of course you have a proper backup system in case just such a disaster should occur.

I am assuming that SmartMonTools is already installed on your machine, but if not, you can get it here http://smartmontools.sourceforge.net/.

First step is to see if your hard drives are S.M.A.R.T. enabled. You can do this using the smartctl application that comes with SmartMonTools. Here is the output I get when I run 'smartctl -d ata -i /dev/sda'

CODE:
  1. # smartctl -d ata -i /dev/sda
  2. smartctl version 5.36 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
  3. Home page is http://smartmontools.sourceforge.net/
  4.  
  5. === START OF INFORMATION SECTION ===
  6. Device Model:     ST3500630AS
  7. Serial Number:    3QG02JST
  8. Firmware Version: 3.AAC
  9. User Capacity:    500,107,862,016 bytes
  10. Device is:        Not in smartctl database [for details use: -P showall]
  11. ATA Version is:   7
  12. ATA Standard is:  Exact ATA specification draft version not indicated
  13. Local Time is:    Thu Jul 10 05:18:47 2008 PDT
  14. SMART support is: Available - device has SMART capability.
  15. SMART support is: Enabled

Those last two lines are what we are looking for. This drive is SMART enabled, so we are good to go. A couple of comments about the command I issued. If you want more information about your hard drive, try using the -a flag, which will show a lot about your hard drive. The '-d ata' flag was required for me to tell smartctl that I am going to check an ata drive. You may not require the -d flag.

The next step is to modify the /etc/smartd.conf file. Using your favourite editor, open up /etc/smartd.conf. The first thing you will do is remove the first line of the file. This line tells smartd that you have modified the file and not to over-write it. If you don't have a smartd.conf file, then you can auto-generate the first version simply by starting and stopping smartd with /etc/init.d/smartd start and then /etc/init.d/smartd stop.

Modify the conf file so that our drives will be monitored regularly. Here is my conf file:

CODE:
  1. # Remove the line above if you have edited the file and you do not want
  2. # it to be overwritten on the next smartd startup.
  3. <SNIP>
  4. /dev/sda -d ata -H -m me@mydomain.ca -M test
  5. /dev/sdb -d ata -H -m me@ mydomain.ca -M test
  6. <SNIP>

First off, you will see that I defined the '-d ata' device flag. The -H flag is telling smartd to monitor the Health of the drive. -m is telling smartd to mail someone, in this case me, of any issues. The '-M test' flag can only be used in conjunction with the -m flag and in this case is telling smartd to send a test email to me on start up. I have added the -M flag as I want to be sure that smartd is really working and can email me.

At the bottom of this post is a partial list of flags that you can use with smartd.

If we try to start smartd right now, you will most likely be disappointed as nothing will happen. We first need to force smartd to see our drives by registering our hard drives with smartd. We can do this by running a quick CLI command for each drive:

CODE:
  1. echo /dev/sda -d ata -m me@mydomain.ca -M test | smartd  -c - -q onecheck

We are piping a string of commands to smartd. The commands should look familiar to you, so I won't go over them again. The flags for smartd in this example are a little different, so lets go over those now. The -c flag is telling smartd to use a specific configuration file. The next single dash, when used with the smartd -c flag, is telling smartd to not use any configuration file, but rather, just accept commands piped in. The -q flag is telling smartd when it should quit. In this case, we are telling smartd to register our drive, run one check on the drive, and then quit. This command line serves two purposes, it registers the device, then verifies that an email can be sent out.

Here is what I get when I run this command:

CODE:
  1. echo /dev/sda -d ata -m me@mydomain.ca -M test | smartd  -c - -q onecheck
  2. smartd version 5.36 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
  3. Home page is http://smartmontools.sourceforge.net/
  4.  
  5. Opened configuration file <stdin>
  6. Drive: /dev/sda, implied '-a' Directive on line 1 of file <stdin>
  7. Configuration file <stdin> parsed.
  8. Device: /dev/sda, opened
  9. Device: /dev/sda, not found in smartd database.
  10. Device: /dev/sda, is SMART capable. Adding to "monitor" list.
  11. Monitoring 1 ATA and 0 SCSI devices
  12. Executing test of mail to me@mydomain.ca ...
  13. Test of mail to me@mydomain.ca: successful
  14. Started with '-q onecheck' option. All devices sucessfully checked once.
  15. smartd is exiting (exit status 0)

The important line here is:

CODE:
  1. Device: /dev/sda, is SMART capable. Adding to "monitor" list.

We have now registered /dev/sda with smartd, and smartd will now monitor this device. In my inbox I got this email:

CODE:
  1. This email was generated by the smartd daemon running on:
  2.  
  3.   host name: server.mydomain.ca
  4.  DNS domain: mydomain.ca
  5.  NIS domain: (none)
  6.  
  7. The following warning/error was logged by the smartd daemon:
  8.  
  9. TEST EMAIL from smartd for device: /dev/sda
  10.  
  11. For details see host's SYSLOG (default: /var/log/messages).

Once you have successfully run the command for all your devices, you can now fire up smartd with '/etc/init.d/smartd start'. If all went well you should have an email like above in your inbox for each device you set up in the config file. This is telling you that the daemon is running, and can send an email when an issue occurs. The last step is to remove the '-M test' flag from each device your /etc/smartd.conf file. Then restart smartd again with '/etc/init.d/smartd restart'.

Be sure that you have added smartd to your init levels 3, 4 and 5 with this command:

CODE:
  1. chkconfig --level 345 smartd on

That's it for today. Hopefully it will help you sleep better at night.

CODE:
  1. # HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE
  2. #   -d TYPE Set the device type to one of: ata, scsi
  3. #   -T TYPE set the tolerance to one of: normal, permissive
  4. #   -o VAL  Enable/disable automatic offline tests (on/off)
  5. #   -S VAL  Enable/disable attribute autosave (on/off)
  6. #   -H      Monitor SMART Health Status, report if failed
  7. #   -l TYPE Monitor SMART log.  Type is one of: error, selftest
  8. #   -f      Monitor for failure of any 'Usage' Attributes
  9. #   -m ADD  Send warning email to ADD for -H, -l error, -l selftest, and -f
  10. #   -M TYPE Modify email warning behavior (see man page)
  11. #   -p      Report changes in 'Prefailure' Normalized Attributes
  12. #   -u      Report changes in 'Usage' Normalized Attributes
  13. #   -t      Equivalent to -p and -u Directives
  14. #   -r ID   Also report Raw values of Attribute ID with -p, -u or -t
  15. #   -R ID   Track changes in Attribute ID Raw value with -p, -u or -t
  16. #   -i ID   Ignore Attribute ID for -f Directive
  17. #   -I ID   Ignore Attribute ID for -p, -u or -t Directive
  18. #   -v N,ST Modifies labeling of Attribute N (see man page)
  19. #   -a      Default: equivalent to -H -f -t -l error -l selftest
  20. #   -F TYPE Use firmware bug workaround. Type is one of: none, samsung
  21. #   -P TYPE Drive-specific presets: use, ignore, show, showall
  22. #    #      Comment: text after a hash sign is ignored
  23. #    \      Line continuation character
  24. # Attribute ID is a decimal integer 1 <= ID <= 255
  25. # All but -d, -m and -M Directives are only implemented for ATA devices



Reader Comments

Explained perfectly