Showing posts with label 'multinode cluster'. Show all posts
Showing posts with label 'multinode cluster'. Show all posts

Friday, May 4, 2012

Key Value Store - monitor cassandra and multinode cluster

As installing cassandra and creating multinode cluster, I'm introducing of how to monitor cassandra and multinode cluster with own nagios-plugin.

Monitor cassandra node(check_by_ssh+cassandra-cli)

There's several ways to monitor cassandra node with Nagios or Icinga such as, JMX or check_jmx. Though they are fairly effective way to monitor cassandra, they need to take some time to prepare. I am afraid that using check_by_ssh and cassandra-cli  is more simple than those ones and no need to install any libraries except for cassandra itself.
  • commands.cfg
define command{
        command_name    check_by_ssh
        command_line    $USER1$/check_by_ssh -l nagios -i /home/nagios/.ssh/id_rsa -H $HOSTADDRESS$ -t $ARG1$ -C '$ARG2$'

  • services.cfg
define service{
      use                     generic-service
      host_name               cassandra
      service_description     Cassandra Node
      check_command           check_by_ssh!22!60!"/usr/local/apache-cassandra/bin/cassandra-cli -h localhost --jmxport 9160 -f /tmp/cassandra_load.txt"
  • setup the file to load statements
    setup the statement file in the cassandra node to be monitored.
    "show cluster name;" shows its cluster name.
# cat > /tmp/cassandra_load.txt << EOF
show cluster name;
EOF
  • plugin status when cassandra is running(service status is OK)
# su - nagios
$ check_by_ssh -l nagios -i /home/nagios/.ssh/id_rsa -H 192.168.213.91 -p 22 -t 10 -C "/usr/local/apache-cassandra/bin/cassandra-cli -h 192.168.213.91 --jmxport 9160 -f /tmp/load.txt"
Connected to: "Test Cluster" on 192.168.213.91/9160
Test Cluster
  • plugin status when cassandra is stopped(service status is CRITICAL)
# su - nagios
$ check_by_ssh -l nagios -i /home/nagios/.ssh/id_rsa -H 192.168.213.91 -l root -p 22 -t 10 -C "/usr/local/apache-cassandra/bin/cassandra-cli -h 192.168.213.91 --jmxport 9160 -f /tmp/load.txt"
Remote command execution failed: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused

Monitor multinode cluster(check_cassandra_cluster.sh)

 The plugin has been released at Nagios Exchange and see the detail there, please.
  • overview
    check if the number of live nodes which belong to multinode cluster is less than the specified number.
    it is enable to specify the threshold with option -w <warning> and -c <critical>.
    get the number of live nodes, their status, and performance data.
  • software requirements
    cassandra(using nodetool command)
  • command help
# check_cassandra_cluster.sh -h
Usage: ./check_cassandra_cluster.sh -H <host> -P <port> -w <warning> -c <critical>

 -H <host> IP address or hostname of the cassandra node to connect, localhost by default.
 -P <port> JMX port, 7199 by default.
 -w <warning> alert warning state, if the number of live nodes is less than <warning>.
 -c <critical> alert critical state, if the number of live nodes is less than <critical>.
 -h show command option
 -V show command version 
  •  when service status is OK
# check_cassandra_cluster.sh -H 192.168.213.91 -P 7199 -w 1 -c 0
OK - Live Node:2 - 192.168.213.92:Up,Normal,65.2KB,86.95% 192.168.213.91:Up,Normal,73.76KB,13.05% | Load_192.168.213.92=65.2KB Owns_192.168.213.92=86.95% Load_192.168.213.91=60.14KB Owns_192.168.213.91=13.05%
  •  when service status is WARNING
# check_cassandra_cluster.sh -H 192.168.213.91 -P 7199 -w 2 -c 0
WARNING - Live Node:2 - 192.168.213.92:Up,Normal,65.2KB,86.95% 192.168.213.91:Up,Normal,73.76KB,13.05% | Load_192.168.213.92=65.2KB Owns_192.168.213.92=86.95% Load_192.168.213.91=60.14KB Owns_192.168.213.91=13.05% 
  •  when status is CRITICAL
# check_cassandra_cluster.sh -H 192.168.213.91 -P 7199 -w 3 -c 2
CRITICAL - Live Node:2 - 192.168.213.92:Up,Normal,65.2KB,86.95% 192.168.213.91:Up,Normal,73.76KB,13.05% | Load_192.168.213.92=65.2KB Owns_192.168.213.92=86.95% Load_192.168.213.91=60.14KB Owns_192.168.213.91=13.05%
  •  when the threshold of warning is less than the one of critical
# check_cassandra_cluster.sh -H 192.168.213.91 -P 7199 -w 3 -c 4
-w <warning> 3 must be less than -c <critical> 4.

Key Value Store - create cassandra multinode cluster

As introducing of installing cassandra before, I am explaining how to create cassandra multinode cluster.

Reference

Create Multinode cluster

  • cassandra nodes
_images/cassandra_node.png
  • Configuring Multinode Cluster 1st node (kvs01)
As cassandra.yaml is for setting up single node by default, it is necessary to change the configurations to create multinode cluster.
# cd /usr/local/apache-cassandra/conf/
# vi cassandra.yaml
auto_bootstrap : false
- seeds: "192.168.213.91"
listen_address: 192.168.213.91
rpc_address: 192.168.213.91
  • difference between the unrevised cassandra.yaml  and revised one
# diff -u cassandra.yaml.bak cassandra.yaml
--- cassandra.yaml.bak 2012-02-22 23:21:44.000000000 +0900
+++ cassandra.yaml     2012-05-04 07:51:31.000000000 +0900
@@ -8,6 +8,7 @@
 # The name of the cluster. This is mainly used to prevent machines in
 # one logical cluster from joining another.
 cluster_name: 'Test Cluster'
+auto_bootstrap : false

 # You should always specify InitialToken when setting up a production
 # cluster for the first time, and often when adding capacity later.
@@ -95,7 +96,7 @@
       parameters:
           # seeds is actually a comma-delimited list of addresses.
           # Ex: "<ip1>,<ip2>,<ip3>"
-          - seeds: "127.0.0.1"
+          - seeds: "192.168.213.91"

 # emergency pressure valve: each time heap usage after a full (CMS)
 # garbage collection is above this fraction of the max, Cassandra will
@@ -178,7 +179,7 @@
 # address associated with the hostname (it might not be).
 #
 # Setting this to 0.0.0.0 is always wrong.
-listen_address: localhost
+listen_address: 192.168.213.91

 # Address to broadcast to other Cassandra nodes
 # Leaving this blank will set it to the same value as listen_address
@@ -190,7 +191,7 @@
 #
 # Leaving this blank has the same effect it does for ListenAddress,
 # (i.e. it will be based on the configured hostname of the node).
-rpc_address: localhost
+rpc_address: 192.168.213.91
 # port for Thrift to listen for clients on
 rpc_port: 9160
  • restart daemon
# pgrep -f cassandra | xargs kill -9
# /usr/local/apache-cassandra/bin/cassandra
  • Configuring Multinode Cluster other node (kvs02,kvs03)
 listen_address and rpc_address are replaced with those of each servers'
It is no need to enable auto_bootstrap as cassandra-1.x is enabled by default.
# cd /usr/local/apache-cassandra/conf/
# vi cassandra.yaml
- seeds: "192.168.213.91"
listen_address: 192.168.213.92
rpc_address: 192.168.213.92
  • difference between the unrevised cassandra.yaml  and revised one
# diff -u cassandra.yaml.bak cassandra.yaml
--- cassandra.yaml.bak 2012-03-23 04:00:43.000000000 +0900
+++ cassandra.yaml     2012-05-04 08:44:14.000000000 +0900
@@ -8,6 +8,7 @@
 # The name of the cluster. This is mainly used to prevent machines in
 # one logical cluster from joining another.
 cluster_name: 'Test Cluster'
+auto_bootstrap: true

 # You should always specify InitialToken when setting up a production
 # cluster for the first time, and often when adding capacity later.
@@ -95,7 +96,7 @@
       parameters:
           # seeds is actually a comma-delimited list of addresses.
           # Ex: "<ip1>,<ip2>,<ip3>"
-          - seeds: "localhost"
+          - seeds: "192.168.213.91"

 # emergency pressure valve: each time heap usage after a full (CMS)
 # garbage collection is above this fraction of the max, Cassandra will
@@ -178,7 +179,7 @@
 # address associated with the hostname (it might not be).
 #
 # Setting this to 0.0.0.0 is always wrong.
-listen_address: localhost
+listen_address: 192.168.213.92

 # Address to broadcast to other Cassandra nodes
 # Leaving this blank will set it to the same value as listen_address
@@ -190,7 +191,7 @@
 #
 # Leaving this blank has the same effect it does for ListenAddress,
 # (i.e. it will be based on the configured hostname of the node).
-rpc_address: localhost
+rpc_address: 192.168.213.92
 # port for Thrift to listen for clients on
 rpc_port: 9160
  • restart daemon
# pgrep -f cassandra | xargs kill -9
# /usr/local/apache-cassandra/bin/cassandra
  • Verify ring status
# nodetool -h localhost ring
Address         DC          Rack        Status State   Load            Owns    Token
                                                                               100438156989107092060814573762535799562
192.168.213.92  datacenter1 rack1       Up     Normal  53.6 KB         93.47%  89332387546649365392870509741689618961
192.168.213.93  datacenter1 rack1       Up     Normal  49.19 KB        3.26%   94885272267878228726842541752112709261
192.168.213.91  datacenter1 rack1       Up     Normal  55.71 KB        3.26%   100438156989107092060814573762535
 Finally, I'm introducing of the monitoring phase next.

iJAWS@Doorkeeper