Nagios checks for failed RAID disks

Armijn Hemel, April 15, 2009, 6378 views.

Tags: , , ,

If you ever had a failing RAID setup, you know that you can get into deep trouble if you don't act fast which means: before you have data loss. You want to be notified as soon as possible. We have actually been in the situation where we went into the dataroom to replace a broken disk, only to see the other disk in the system break down as well during power up, but before we had the chance to rebuild the RAID. To get notified faster of when things are going wrong we use Nagios a lot. We added a simple script to our collection of Nagios scripts that we execute every so many minutes to warn us via mail and Jabber if RAID has failed on one of our Linux servers (if we accidentily miss the high load and warning mails from the system):

#!/bin/sh
RES=`cat /proc/mdstat| grep '(F)'`
if test "$RES" = ''; then
        res=0
else
        res=2
        echo "RAID failure:"
        echo $RES
fi
exit $res
Social networking: Tweet this article on Twitter Pass on this article on LinkedIn Bookmark this article on Google Bookmark this article on Yahoo! Bookmark this article on Technorati Bookmark this article on Delicious Share this article on Facebook Digg this article on Digg Submit this article to Reddit Thumb this article up at StumbleUpon Submit this article to Furl

Talkback

respond to this article