Related articles
Nagios checks for failed RAID disks
Armijn Hemel,
April 15, 2009,
2561 views.
Tags: administration, linux, nagios, raid
If you ever had a failing RAID setup, you know that you can get into deep trouble if you don't act fast which means: before you have data loss. You want to be notified as soon as possible. We have actually been in the situation where we went into the dataroom to replace a broken disk, only to see the other disk in the system break down as well during power up, but before we had the chance to rebuild the RAID. To get notified faster of when things are going wrong we use Nagios a lot. We added a simple script to our collection of Nagios scripts that we execute every so many minutes to warn us via mail and Jabber if RAID has failed on one of our Linux servers (if we accidentily miss the high load and warning mails from the system):
#!/bin/sh
RES=`cat /proc/mdstat| grep '(F)'`
if test "$RES" = ''; then
res=0
else
res=2
echo "RAID failure:"
echo $RES
fi
exit $res
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Netherlands License.










