When you are running RabbitMQ server in production environment, it is essential to monitor RabbitMQ to make sure it is up and running properly, and all the messages in the RabbitMQ are getting processed properly.
If you are already using Nagios for your enterprise monitoring, you can monitor RabbitMQ using plugins.
nagios-plugins-rabbitmq is a Nagios plugin package that currently has 6 checks to monitor various aspects of RabbitMQ server.
This tutorial explains how to install, configure and monitor RabbitMQ Server using check_rabbitmq plugin.
This tutorial explains how to install, configure and monitor RabbitMQ Server using check_rabbitmq plugin.
1. Download check_rabbitmq Nagios Plugin
Download Nagios RabbitMQ plugin from here. Or, you can use wget to download it directly to your server as shown below.
cd ~ wget --no-check-certificate https://github.com/jamesc/nagios-plugins-rabbitmq/archive/master.zip unzip nagios-plugins-rabbitmq-master.zip
After you unzip the download, it will create the nagios-plugins-rabbitmq-master directory. Rename this directory to nagios-plugins-rabbitmq (i.e Remove the “-master” from the directory name).
mv nagios-plugins-rabbitmq-master nagios-plugins-rabbitmq
2. Install Plugin in Libexec directory
Move this “nagios-plugins-rabbitmq” directory to nagios libexec directory where all the plugins are located. If you’ve installed Nagios from source, the location of libexec directory is /usr/local/nagios/libexec as shown below.
mv nagios-plugins-rabbitmq /usr/local/nagios/libexec
Also, make sure this plugin directory is owned by nagios user and group as shown below.
cd /usr/local/nagios/libexec/ chown -R nagios:nagios nagios-plugins-rabbitmq/
At this stage, if you test the nagios plugin by executing check_rabbitmq_server, you might get “Can’t locate Nagios/Plugin.pm in @INC” error message as shown below.
# cd /usr/local/nagios/libexec/nagios-plugins-rabbitmq/scripts # ./check_rabbitmq_server Can't locate Nagios/Plugin.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.7/i386-linux-thread-multi
3. Install Nagios::Plugin Perl Module
This RabbitMQ Server Nagios plugin requires the “Nagios::Plugin” perl package, which is a bunch of perl modules that is used for writing Nagios plugin in perl.
Install the Nagios::Plugin perl module as shown below. You can either install perl module from source, or from cpan shell command.
cd /usr/save wget http://search.cpan.org/CPAN/authors/id/T/TO/TONVOON/Nagios-Plugin-0.36.tar.gz tar xvfz Nagios-Plugin-0.36.tar.gz cd Nagios-Plugin-0.36 perl Makefile.PL make make test make install
4. Additional Perl Module Dependencies
In my case, I also needed the following Perl modules for the check_rabbitmq_server plugin to work properly.
Install the following Perl modules that are required from cpan shell as shown below:
cpan> install Math::Calc::Units cpan> install Config::Tiny cpan> install JSON cpan> install Math::Calc::Units
After all the dependecies are installed, check_rabbitmq_server will not give any perl module error, instead you’ll see the following missing argument usage error message:
# ./check_rabbitmq_server Usage: check_rabbitmq_server [options] -H hostname Missing argument: hostname
5. Basic check_rabbitmq Usage
The following will connect to the RabbitMQ server that is running on dev-db server on port 15672, and return an OK message, with “Memory”, “Process”, “FD” and “Sockets” information of the connected RabbitMQ server as shown below.
# ./check_rabbitmq_server -H "dev-db" --port=15672 RABBITMQ_SERVER OK - Memory OK (1.16%) Process OK (0.02%) FD OK (2.93%) Sockets OK (0.12%) | Memory=1.16%;80;90 Process=0.02%;80;90 FD=2.93%;80;90 Sockets=0.12%;80;90
When the RabbitMQ Server is not up and running, you’ll get the following error message.
# ./check_rabbitmq_server -H "dev-db" --port=15672 RABBITMQ_SERVER CRITICAL - Received 500 Can't connect to dev-db:15672 (connect: Connection refused) for path: nodes/rabbit@dev-db
6. Specify Username and Password
This plugin uses the RabbitMQ HTTP API that is used in the RabbitMQ management plugin.
In the previous example, the check_rabbitmq_server command uses the default username and password combination. i.e guest/guest. If you get the following “Access refused: nodes/rabbit@” error message, you’ll know that the default username/password combination (i.e guest/guest) is invalid.
# ./check_rabbitmq_server -H dev-db RABBITMQ_SERVER UNKNOWN - Access refused: nodes/rabbit@dev-db
If you’ve changed the username and password for the RabbitMQ Management plugin, you should specify that particular username using -u and -p parameter as shown below.
The following example connects to the RabbitMQ server that is running on dev-db server on port 15672, using the username “guest”, and password “MySecretPassword”.
# ./check_rabbitmq_server -H "dev-db" --port=15672 -u "guest" -p "MySecretPassword" RABBITMQ_SERVER OK - Memory OK (1.16%) Process OK (0.02%) FD OK (2.34%) Sockets OK (0.12%) | Memory=1.16%;80;90 Process=0.02%;80;90 FD=2.34%;80;90 Sockets=0.12%;80;90
Note: In all the following example, I did not pass the username/password, as it assumes that the default password is used. If you’ve changed it on your RabbitMQ server, make sure you pass -u and -p in all the following examples.
7. check_rabbitmq_overview Usage Example
Using check_rabbitmq_overview command you can monitor the following combination of values for critical and warning levels: 1) Total number of messages in the queue 2) Total number of messages that are ready 3) Total number of messages that are not acknowledged yet
If you don’t pass any critical or warning level, you’ll get an OK message, with the total messages, messages_ready and messages_unacknowledged as shown below.
# ./check_rabbitmq_overview -H "dev-db" --port=15672 RABBITMQ_OVERVIEW OK - messages OK (229) messages_ready OK (229) messages_unacknowledged OK (0) | messages=229;; messages_ready=229;; messages_unacknowledged=0;;
This is an example of using check_rabbitmq_overview with critcal and warning levels. Since the following example returns the total number of messages larger than the critical limit of “100,10,10″, it returns the CRITICAL message.
# ./check_rabbitmq_overview -H "dev-db" --port=15672 -c 1000,10,10 -w 15,15,15 RABBITMQ_OVERVIEW CRITICAL - messages_ready CRITICAL (229), messages WARNING (229), messages_unacknowledged OK (0) | messages=229;15;1000 messages_ready=229;15;10 messages_unacknowledged=0;15;10
This is an example of using check_rabbitmq_overview with critcal and warning levels. Since the following example returns the total number of messages larger than the warning limit of “15,15,15″, it returns the WARNING message.
# ./check_rabbitmq_overview -H "dev-db" --port=15672 -c 1000,500,500 -w 15,15,15 RABBITMQ_OVERVIEW WARNING - messages WARNING (229) messages_ready WARNING (229), messages_unacknowledged OK (0) | messages=229;15;1000 messages_ready=229;15;500 messages_unacknowledged=0;15;500
8. check_rabbitmq_objects Usage Example
The following example will return the total number of object count for vhosts, exchange, bindings, queues and channels as shown below.
# ./check_rabbitmq_objects -H "dev-db" --port=15672 RABBITMQ_OBJECTS OK - Gathered Object Counts | vhost=0;; exchange=8;; binding=3;; queue=1;; channel=0;;
Just like previous example, you can also pass warning and critical limits based on when you want to send the warning and critical alert for the total number of objects mentioned above.
9. check_rabbitmq_aliveness Usage Example
This will return whether the vhost defined in the RabbitMQ Server is alive or not. The following example returns an OK message for the aliveness check.
# ./check_rabbitmq_aliveness -H "dev-db" --port=15672 RABBITMQ_ALIVENESS OK - vhost: /
There is also a check_rabbitmq_watermark script that comes with this package, which displays mem_alarm and disk_free_alarm as shown below.
# ./check_rabbitmq_watermark -H "dev-db" --port=15672 RABBITMQ_WATERMARK CRITICAL - mem_alarm disk_free_alarm
10. check_rabbitmq_queue Usage Example
This is helpful when you have several queues defined in your RabbitMQ instance, and you want to monitor a particular queue.
For example, the following monitors “DEV.Error.Read” queue specifically, and returns the messages, messages_ready, messages_unacknowledged,and consumers as shown below.
# ./check_rabbitmq_queue -H "dev-db" --port=15672 --queue="DEV.Error.Read" RABBITMQ_QUEUE OK - messages OK (0) messages_ready OK (0) messages_unacknowledged OK (0) consumers OK (0) | messages=0;; messages_ready=0;; messages_unacknowledged=0;; consumers=0;;
Just like the previous examples, you can also set warning and critical limits using -w and -c as shown below. In the following example, it gives CRITICAL messages, as the total number of messages exceeded the critical limit of 200.
# ./check_rabbitmq_queue -H "dev-db" --port=15672 --queue="DEV.Status.Read" -w 100 -c 200 RABBITMQ_QUEUE CRITICAL - messages CRITICAL (229), messages_ready OK (229) messages_unacknowledged OK (0) consumers OK (0) | messages=229;100;200 messages_ready=229;; messages_unacknowledged=0;; consumers=0;;
11. Add check_rabbitmq_* Command Definitions
Append all the above check_rabbitmq_* commands to the commands.cfg file. This will setup all the proper check_rabbitmq_* command definitons that you can use in your Nagios service definitions.
The following examples shows that we’ve added the check_rabbitmq_server command definition to Nagios
# vi /usr/local/nagios/etc/objects/commands.cfg define command { command_name check_rabbitmq_server command_line $USER1$/nagios-plugins-rabbitmq/scripts/check_rabbitmq_server -H $ARG1$ --port=$ARG2$ -u $ARG3$ -p $ARG4$ }
As you see above in all the examples, we were using the hostname (instead of ip-address). On my instance, instead of using “$HOSTADDRESS$” to get the ip-address, I’m passing the hostname as an argument to the command itself. If the $HOSTADDRESS$ works for you, change the $ARG1 to $HOSTADDRESS$, and adjust the other ARG numbers accordingly.
12. Create Nagios Service Definition for RabbitMQ Server
Once you’ve tested the Nagios RabbitMQ plugin from the command line, create a service definition like the following and place it under the /usr/local/nagios/etc/servers directory. In the following example, the RabbitMQ is running on the server called “dev-db”
# cat dev-db-server.cfg define service { use generic-service host_name dev-db service_description RabbitMQ contacts prodalert check_command check_rabbitmq_server!dev-db!15672!guest!MySecretPassword }
Restart the nagios after the above change. After this, anytime RabbitMQ Server goes down, Nagios will send an alert to the contacts defined in the “prodalert” object.
No comments:
Post a Comment