Monitoring DRBD using Munin
Tuesday, 2009-02-24 23:33, 1235518411 seconds since Unix epoch
Maybe some of you have noticed, quite some wasda.nl machines went down today. The cause of this problem was a disconnected DRBD, which went unnoticed for quite some time. This has been solved after some kill-dash-nines, hard resets, blood, sweat and tears.
Because network admins don’t really like zombie killing parties a solution had to be found. We’ve got munin running for quite some time now. It does a great job at warning the right people if shit is about to hit the fan. So I’ve written a small script which graphs DRBD network and disk usage. It also sends out an error if a DRBD gets disconnected.
You can grab it here. The configuration is fairly simple. Just copy this into your plugin directory and symlink it with these names:
- drbd_net_N: Network traffic by /dev/drbdN
- drbd_disk_N: Disk usage by /dev/drbdN
One note of caution though. This plugin uses API and Proto version 86 of DRBD. I haven’t tested it with any other version. The code is easy enough so you’re welcome to edit it.
Bert Says:
Very cool :-)
mun Says:
Please explain the symlink proces in more detail.
vt Says:
Btw, during sync, the plugin will not work as it only accepts the connected state. To fix:
- if [ "$CONNECTED" != "Connected" ]; then
+ if [ "$CONNECTED" != "Connected" ] && [ "$CONNECTED" != "VerifyT" ]; then
jorrizza Says:
Actually, that’s not entirely right. During sync you’ve essentially got an SPOF. I can’t count that as uptime. The storage may be accessible, but if the master fails you’re screwed. It’s a matter of personal preference I guess.
Roger Pixley Says:
The error that it generates will allow Munin to send off an e-mail saying the disks are not connected? How do you put that in the munin.conf file?