hashnao's Blog: Key Value Store - monitor cassandra and multinode cluster

As installing cassandra and creating multinode cluster, I'm introducing of how to monitor cassandra and multinode cluster with own nagios-plugin.

Monitor cassandra node(check_by_ssh+cassandra-cli)

There's several ways to monitor cassandra node with Nagios or Icinga such as, JMX or check_jmx. Though they are fairly effective way to monitor cassandra, they need to take some time to prepare. I am afraid that using check_by_ssh and cassandra-cli is more simple than those ones and no need to install any libraries except for cassandra itself.

commands.cfg

define command{
        command_name    check_by_ssh
        command_line    $USER1$/check_by_ssh -l nagios -i /home/nagios/.ssh/id_rsa -H $HOSTADDRESS$ -t $ARG1$ -C '$ARG2$'

services.cfg

define service{
      use                     generic-service
      host_name               cassandra
      service_description     Cassandra Node
      check_command           check_by_ssh!22!60!"/usr/local/apache-cassandra/bin/cassandra-cli -h localhost --jmxport 9160 -f /tmp/cassandra_load.txt"

setup the file to load statements
setup the statement file in the cassandra node to be monitored.
"show cluster name;" shows its cluster name.

# cat > /tmp/cassandra_load.txt << EOF
show cluster name;
EOF

plugin status when cassandra is running(service status is OK)

# su - nagios
$ check_by_ssh -l nagios -i /home/nagios/.ssh/id_rsa -H 192.168.213.91 -p 22 -t 10 -C "/usr/local/apache-cassandra/bin/cassandra-cli -h 192.168.213.91 --jmxport 9160 -f /tmp/load.txt"
Connected to: "Test Cluster" on 192.168.213.91/9160
Test Cluster

plugin status when cassandra is stopped(service status is CRITICAL)

# su - nagios
$ check_by_ssh -l nagios -i /home/nagios/.ssh/id_rsa -H 192.168.213.91 -l root -p 22 -t 10 -C "/usr/local/apache-cassandra/bin/cassandra-cli -h 192.168.213.91 --jmxport 9160 -f /tmp/load.txt"
Remote command execution failed: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused

Monitor multinode cluster(check_cassandra_cluster.sh)

The plugin has been released at Nagios Exchange and see the detail there, please.

overview
check if the number of live nodes which belong to multinode cluster is less than the specified number.
it is enable to specify the threshold with option -w <warning> and -c <critical>.
get the number of live nodes, their status, and performance data.
software requirements
cassandra(using nodetool command)

command help

# check_cassandra_cluster.sh -h
Usage: ./check_cassandra_cluster.sh -H <host> -P <port> -w <warning> -c <critical>

 -H <host> IP address or hostname of the cassandra node to connect, localhost by default.
 -P <port> JMX port, 7199 by default.
 -w <warning> alert warning state, if the number of live nodes is less than <warning>.
 -c <critical> alert critical state, if the number of live nodes is less than <critical>.
 -h show command option
 -V show command version

when service status is OK

# check_cassandra_cluster.sh -H 192.168.213.91 -P 7199 -w 1 -c 0
OK - Live Node:2 - 192.168.213.92:Up,Normal,65.2KB,86.95% 192.168.213.91:Up,Normal,73.76KB,13.05% | Load_192.168.213.92=65.2KB Owns_192.168.213.92=86.95% Load_192.168.213.91=60.14KB Owns_192.168.213.91=13.05%

when service status is WARNING

# check_cassandra_cluster.sh -H 192.168.213.91 -P 7199 -w 2 -c 0
WARNING - Live Node:2 - 192.168.213.92:Up,Normal,65.2KB,86.95% 192.168.213.91:Up,Normal,73.76KB,13.05% | Load_192.168.213.92=65.2KB Owns_192.168.213.92=86.95% Load_192.168.213.91=60.14KB Owns_192.168.213.91=13.05%

when status is CRITICAL

# check_cassandra_cluster.sh -H 192.168.213.91 -P 7199 -w 3 -c 2
CRITICAL - Live Node:2 - 192.168.213.92:Up,Normal,65.2KB,86.95% 192.168.213.91:Up,Normal,73.76KB,13.05% | Load_192.168.213.92=65.2KB Owns_192.168.213.92=86.95% Load_192.168.213.91=60.14KB Owns_192.168.213.91=13.05%

when the threshold of warning is less than the one of critical

# check_cassandra_cluster.sh -H 192.168.213.91 -P 7199 -w 3 -c 4
-w <warning> 3 must be less than -c <critical> 4.

hashnao's Blog

ページ

Friday, May 4, 2012

Key Value Store - monitor cassandra and multinode cluster

Monitor cassandra node(check_by_ssh+cassandra-cli)

Monitor multinode cluster(check_cassandra_cluster.sh)

No comments:

Post a Comment

iJAWS@Doorkeeper

AWS Certified SysOps Administrator - Associate Level

AWS Certified Solutions Architect Associate Level