During my Puppetcamp Gent talk last week, I explained how to get alerts based on trends from graphite. A number of people asked ,e how to do that.
First lets quickly explain why you might want to do that .
Sometimes you don't care about the current value of a metric..as an example take a Queing system .. there is no problem if there are messages added to the queue, not even if there are a lot of messages on the queue, there might however be a problem if over a certain period the number of messages on a queue stays to high.
In this example I`m monitoring the queue length of a hornetq setup which is exposed by JMX.
On the server runnnig HornetQ I have an exported resource that tells the JMXTrans server to send the MessageCount to graphite
(you could also do this using collectd plugins)
@@jmxtrans::graphite {"MessageCountMonitor-${::fqdn}":
jmxhost => hiera('hornetqserver'),
jmxport => "5446",
objtype => 'org.hornetq:type=Queue,*',
attributes => '"MessageCount","MessagesAdded","ConsrCount"',
resultalias => "hornetq",
typenames => "name",
graphitehost => hiera('graphite'),
graphiteport => "2003",
}
This gives me a computable url on which I can get the graphite view
The next step then is to configure a nagios check that verifies this data. For that I need to use the check_graphite plugin from Datacratic ..
Which can work with an nrpe config like
### File managed with puppet ###
### Served by: '<%= scope.lookupvar('::servername') %>'
### Module: '<%= scope.to_hash['module_name'] %>'
### Template source: '<%= template_source %>'
command[check_hornetq]=/usr/lib64/nagios/plugins/check_graphite -u "http://<%= graphitehost%>/render?target=servers.<%= scope.lookupvar('::fqdn').gsub(/\./,'_')%>_5446.hornetq.docstore_private_trigger_notification.MessageCount&from=-30minutes&rawData=true" -w 2000 -c 20000
I define this check on the host where HornetQ is running as it then will map to that host on Icinga/Nagios rather than throw a host error on an unrelated host.