Showing posts with label Deployment. Show all posts
Showing posts with label Deployment. Show all posts

Friday, 27 March 2015

High-availability options for MySQL

The technologies allowing to build highly-available (HA) MySQL solutions are in constant evolution and they cover very different needs and use cases. In order to help people choose the best HA solution for their needs, we decided, Jay Janssen and I, to publish, on a regular basis (hopefully, this is the first), an update on the most common technologies and their state, with a focus on what type of workloads suite them best. We restricted ourselves to the open source solutions that provide automatic failover. Of course, don’t simply look at the number of Positives/Negatives items, they don’t have the same values. Should you pick any of these technologies, heavy testing is mandatory, HA is never beyond scenario that have been tested.

Percona XtraDB Cluster (PXC)

Percona XtraDB Cluster (PXC) is a version of Percona Server implementing the Galera replication protocol from Codeship.
POSITIVE POINTSNEGATIVE POINTS
  • Almost synchronous replication, very small lag if any
  • Automatic failover
  • At best with small transactions
  • All nodes are writable
  • Very small read after write lag, usually no need to care about
  • Scale reads very well and to some extent, writes
  • New nodes are provisioned automatically through State Snapshot Transfer (SST)
  • Multi-threaded apply, greater write capacity than regular replication
  • Can do geographical disaster recovery (Geo DR)
  • More resilient to unresponsive nodes (swapping)
  • Can resolve split-brain situations by itself
  • Still under development, some rough edges
  • Large transactions like multi-statement transactions or large write operations cause issues and are usually not a good fit
  • For quorum reasons, 3 nodes are needed but one can be a lightweight arbitrator
  • SST can be heavy over a Wan
  • Commit are affected by the network latency, this impacts especially Geo DR
  • To achieve HA, a load balancer, like haproxy, is needed
  • Failover time is determined by the load balancer check frequency
  • Performance is affected by the weakest/busiest node
  • Foreign Keys are potential issues
  • MyISAM should be avoided
  • Can be mixed with regular async replication as master or slave but, slaves are not easy to reconfigure after a SST on their master
  • Require careful setup of the host, swapping can lead to node expulsion from the cluster
  • No manual failover mode
  • Debugging some Galera protocol issues isn’t trivial

Percona replication manager (PRM)

Percona replication manager (PRM) uses the Linux HA Pacemaker resource manager to manage MySQL and replication and provide high-availability. Information about PRM can be found here, the official page on the Percona web site is in the making.
POSITIVE POINTSNEGATIVE POINTS
  • Nothing specific regarding the workload
  • Unlimited number of slaves
  • Slaves can have different roles
  • Typically VIP based access, typically 1 writer VIP and many reader VIPs
  • Also works without VIP (see the fake_mysql_novip agent)
  • Detects if slave lags too much and remove reader VIPs
  • All nodes are monitored
  • The best slaves is picked for master after failover
  • Geographical Disaster recovery possilbe with the lightweight booth protocol
  • Can be operated in manual failover mode
  • Graceful failover is quick, under 2s in normal conditions
  • Ungraceful failover under 30s
  • Distributed operation with Pacemaker, no single point of failure
  • Builtin pacemaker logic, stonith, etc. Very rich and flexible.
  • Still under development, some rough edges
  • Transaction maybe lost is master crashes (async replication)
  • For quorum reasons, 3 nodes are needed but one can be a lightweight arbitrator
  • Only one node is writable
  • Read after write may not be consistent (replication lag)
  • Only scales reads
  • Careful setup for the host, swapping can lead to node expulsion from the cluster
  • Data inconsistency can happen if the master crashes (fix coming)
  • Pacemaker is complex, logs are difficult to read and understand

MySQL master HA (MHA)

Like with PRM above, MySQL master HA (MHA), provides high-availability through replication. The approach is different, instead of relying on an HA framework like Pacemaker, it uses Perl scripts. Information about MHA can be found here.
POSITIVE POINTSNEGATIVE POINTS
  • Mature
  • Nothing specific regarding the workload
  • No latency effects on writes
  • Can have many slaves and slaves can have different roles
  • Very good binlog/relaylog handling
  • Work pretty hard to minimise data loss
  • Can be operated in manual failover mode
  • Graceful failover is quick, under 5s in normal conditions
  • If the master crashes, slaves will be consistent
  • The logic is fairly easy to understand
  • Transaction maybe lost is master crashes (async replication)
  • Only one node is writable
  • Read after write may not be consistent (replication lag)
  • Only scales reads
  • Monitoring and logic are centralized, single-point of failure, a network partition can cause a split-brain
  • Custom fencing devices, custom VIP scripts, no reuse of other projects tools
  • Most of the deployments are using manual failover (at least at Percona)
  • Requires priviledged ssh access to read relay-logs, can be a security concern
  • No monitoring of the slave to invalidate it if it lags too much or if replication is broken, need to be done by external tool like HAProxy
  • Careful setup for the host, swapping can lead to node expulsion from the cluster

NDB Cluster

NDB cluster is the most high-end form of high-availability configuration for MySQL. It is a complete shared nothing architecture where the storage engine is distributed over multiple servers (data nodes). Probably the best starting point with NDB is the official document, here.
POSITIVE POINTSNEGATIVE POINTS
  • Mature
  • Synchronous replication
  • Very good at small transactions
  • Very good at high concurrency (many client threads)
  • Huge transaction capacity, more than 1M trx/s are not uncommon
  • Failover can be ~1s
  • No single point of failure
  • Geographical disaster recovery capacity built-in
  • Strong at async replication, applying by batches gives multithreaded apply at the data node level
  • Can scale reads and writes, the framework implements sharding by hashes
  • Not a drop-in replacement for Innodb, you need to tune the schema and the queries
  • Not a general purpose database, some loads like reporting are just bad
  • Only the Read-commited isolation level is available
  • Hardware heavy, need 4 servers mininum for full HA
  • Memory (RAM) hungry, even with disk-based tables
  • Complex to operate, lots of parameters to adjust
  • Need a load balancer for failover
  • Very new foreign key support, field reports scarce on it

Shared storage/DRBD

Achieving high-availability use a shared storage medium is an old and well known method. It is used by nearly all the major databases. The share storage can be a DAS connected to two servers, a LUN on SAN accessible from 2 servers or a DRBD partition replicated synchronously over the network. DRBD is by bar the most common shared storage device used in the MySQL world.
POSITIVE POINTSNEGATIVE POINTS
  • Mature
  • Synchronous replication (DRBD)
  • Automatic failover is easy to implement
  • VIP based access
  • Write capacity is impacted by network latency for DRBD
  • SANs are expensive
  • Only for InnoDB
  • Standby node, a big server doing nothing
  • Need a warmup period after failover to be fully operational
  • Disk corruption can spread

Saturday, 10 January 2015

[How-to] Creating Highly Available Message Queues using RabbitMQ


Every day in the world of modern technology, high availability has become the key requirement of any layer in a technology. Message broker software has become a significant component of most stacks. In this article, we will discuss how to create highly available message queues using RabbitMQ.
RabbitMQ is an open source message broker software (also called message-oriented middleware) that implements the Advanced Message Queuing Protocol (AMQP). RabbitMQ server is written in the Erlang programming language.

The RabbitMQ Cluster

Clustering connects multiple nodes to form a single logical broker. Virtual hosts, exchanges, users and permissions are mirrored across all nodes in a cluster. A client connecting to any node can see all the queues in a cluster.
[Tweet "#RabbitMQ tolerates the failure of individual nodes. Nodes can be stopped and started in a cluster."]
Clustering enables high availability of queues and increases the throughput.
A node can be a Disc node or RAM node. RAM node keeps the message state in memory with the exception of queue contents which can reside on a disk if the queue is persistent or too big to fit into memory.
RAM nodes perform better than Disc nodes because they don’t have to write to a disk as much as disk nodes. But, it is always recommended to have disk nodes for persistent queues.
We’ll discuss how to create and convert RAM and Disk nodes later in the post.

Prerequisites:

  1. Network connection between nodes must be reliable.
  2. All nodes must run the same version of Erlang and RabbitMQ.
  3. All TCP ports should be open between nodes.
We have used CentOS for the demo. Installation steps may vary for Ubuntu and OpenSuse. In this demo, we have launched two m1.small servers in AWS for master and slave nodes.

1. Install Rabbitmq

Install Rabbitmq in master and slave nodes.
$ yum install rabbitmq-server.noarch

2. Start Rabbitmq

/etc/init.d/rabbitmq-server start

3. Create the Cluster

Stop RabbitMQ in Master and slave nodes. Ensure service is stopped properly.
/etc/init.d/rabbitmq-server stop
Copy the file below to all nodes from the master. This cookie file needs to be the same across all nodes.
$ sudo cat /var/lib/rabbitmq/.erlang.cookie
Make sure you start all nodes after copying the cookie file from the master.
Start RabbitMQ in master and all nodes.
$ /etc/init.d/rabbitmq-server start
Then run the following commands in all the nodes, except the master node:
$ rabbitmqctl stop_app
$ rabbitmqctl reset
$ rabbitmqctl start_app
Now, run the following commands in the master node:
$ rabbitmqctl stop_app
$ rabbitmqctl reset
Do not start the app yet.
The following command is executed to join the slaves to the cluster:
$ rabbitmqctl join_cluster rabbit@slave1 rabbit@slave2
Update slave1 and slave2 with the hostnames/IP address of the slave nodes. You can add as many slave nodes as needed in the cluster.
Check the cluster status from any node in the cluster:
$ rabbitmqctl cluster_status
By default, the cluster stores messages on the disk. You can also choose to store Queues in Memory.
You can have a node as a RAM node while attaching it to the cluster:
$ rabbitmqctl stop_app
$ rabbitmqctl join_cluster --ram rabbit@slave1
It is recommended to have at least one disk node in the cluster so that messages are stored on a persistent disk and can avoid any loss of messages in case of a disaster.
The performance of RAM nodes are a little better than disk nodes, and gives you better throughput.

4. Set the HA Policy

The following command will sync all the queues across all nodes:
$ rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'
Ex: Policy where queues whose names begin with "ha." are mirrored to all nodes in the cluster: $ rabbitmqctl set_policy ha-all "^ha\." '{"ha-mode":"all"}'

5. Test the Queue mirror

We are going to run a sample python program to create a sample queue. You need the below packages installed from where you want to run the program.
Install python-pip
$ yum install python-pip.noarch
Install Pika
$ sudo pip install pika==0.9.8
Create send.py file and copy the content below. You need to update the “localhost” with name/ip of master/slave node.
#!/usr/bin/env python
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(
              'localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
def callback(ch, method, properties, body):
   print " [x] Received %r" % (body,)
channel.basic_consume(callback,
                     queue='hello',
                     no_ack=True)
print ' [*] Waiting for messages. To exit press CTRL+C'
channel.start_consuming()
Run the python script using the command:
$ python send.py
This will create a Queue (hello) with a message on the RabbitMQ cluster.
Check if the message is available across all nodes.
$ sudo rabbitmqctl list_queues
Now, create a file named receive.py and copy the content below.
#!/usr/bin/env python
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(
              'localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
def callback(ch, method, properties, body):
   print " [x] Received %r" % (body,)
channel.basic_consume(callback,
                     queue='hello',
                     no_ack=True)
print ' [*] Waiting for messages. To exit press CTRL+C'
channel.start_consuming()
Run the script and check the Queue in either the slave or master:
$ sudo rabbitmqctl list_queues

6. Setup Load Balancer

Now, we have multiple MQs running in a cluster. All are in sync and have the same queues.
How do we point our application to the cluster? We can’t point our application to a single node. If a node fails, we need a mechanism to auto failover to other nodes in cluster. There are multiple ways to achieve it. But, we prefer to use the load balancer.
There are two advantages in using the load balancer:
  1. High availability
  2. Better network throughput because the load is evenly distributed across nodes.
Create a load balance in front of it and map the backend MQ instance. You can choose either HAProxy or Apache or Nginx or any hardware load balancer you use in your organization.
If the servers are running in AWS inside a VPC, then choose internal load balancer. Update the application to point to the load balancer end point.

Thursday, 20 November 2014

Setting up Entity Framework for Production Use

Abstract: Over the last couple of years, Entity Framework has steadily become the de facto data access story from Microsoft. With EntityFramework Code First, it became even easier to get started with an application. While EF lets you get off the ground and running fast, it’s deployment story (both first time and subsequent upgrades) has been a little sketchy. With the release 4.2 and EF Migrations, the upgrade story has been simplified a little but the first time deployment story still has a couple of points to keep in mind. Today we will explore these points and see how we can deploy a Database that is accessed by EF (Code First). We keep Migrations for another day.

Over the last couple of years, Entity Framework (EF) has matured steadily to become the de facto Data Access story from Microsoft. We all love the speed with which we can get off the ground with our prototypes. However, deploying the database to a typical shared hosting solution has a few challenges. Today we will take a simple application and see what it takes to deploy the database on AppHarbor. Use of AppHarbor is a matter of convenience. We could use any hosting provider; most of them have similar restriction with respect to the database creation and deletion permissions.

The Sample Application

Let’s say we have a very simple application with a single table that maps to an entity called BlogPost. For additional tables, we will use the ASP.NET Internet Template that includes the Authentication and Authorization entities. The Default Providers will create the required tables for these as well.
Before the first run, let’s add the BlogPost entity and add a default set of views using the default scaffold tooling.

The BlogPost Entity and Scaffolding Setup

- Add a new Class – BlogPost, in the Models folder. It has three properties, Id (of type int), Title (of type string) and PostDetails (of type string).
blog-post-entity
- We save and build our project at this point
- Next we right click on the Controller folder in Solution Explorer and select ‘Add New Controller’ to bring up the New Controller Wizard.
- We setup the Controller as follows:
  • Name: BlogPostController
  • Scaffolding Options: MVC Controller with read/write actions ad views, using Entity Framework
  • Model Class: Select the entity created above e.g. BlogPost
  • Data Context Class: Press down arrow to select ‘Add New’ option. This will show another popup where we select the default Context Class name provided. As shown below it should be something like [Solution Name].Models.[ProjectName]Context
blog-post-controller
- Click on ‘Add’ to continue.
- The Scaffolding infrastructure will try to connect to the default .\SQLEXPRESS database. If it fails to find the DB, it will give an error as follows
possible-error
- In case of an Error, click on OK to dismiss the dialog and open Web.config file. Update the server name in the ‘Data Source’ to point to the correct SQL Server (below we see the SQL Express instance is called SQLEXPRESSR2 instead of the traditional SQLEXPRESS. Change the ‘Initial Catalog’ name to something like EfDeploySample, instead of the default string
connection-string-in-web-config
  • Now is a good time to ensure there is only one connection string being used. If you use the ‘Internet’ template, a connection string with the name ‘DefaultConnection’ is added and the ‘Authentication’/’Authorization’/’Personalization’ providers use this ‘DefaultConnection’ connection string. Delete the connection from the <ConnectionStrings> section and replace the reference to it, with the above connection string wherever required.
web-config-change-default-connection-string
- Repeat the Add Controller step above. If successful, the following will be added to the solution
  • A new folder BlogPost will be added in the ‘Views’ folder, with the default views for Create, Delete, Details, Edit and Index.
  • The BlogPostController is added to the Controllers folder
  • The EfDeploySampleContext is added to the Models folder
updated-solution-structure
- To make use of the Authentication framework that came along, we will add the [Authorize] attribute on top of the BlogPostController, so that actions on it cannot be invoked unless a user is Logged in. With that, our application is set to go.
- Run the Application. Use the ‘Register’ link on the top to register a username. Then navigate to /BlogPost URL to get to the Index page of the Blog Posts. Add a sample Post. With our application up and running, it’s time to attend to the Database Migration piece. If we look at our database in Management Studio, it looks as follows:

database-structure-in-management-studio
- Pretty neat considering the fact that we didn’t write a scratch of SQL to generate those tables.

EF Database Initialization Strategies

EntityFramework comes with a set of default initialization strategies. Without any configuration, it is set up as ‘Create if schema does not exist’. You can explicitly specify it in the DB Context class by overriding the OnModelCreating method as follows:
on-model-creating
In this strategy, EF creates the Database if it doesn’t find it. Once created, it will not update the schema unless you drop and recreate again (Yes, Migrations will update without a DB drop, but for now we are not looking at Migrations).
To test the Default strategy, let’s drop our data from our Database and Run the application. As we try to Navigate to the BlogPost index, we will get re-directed to the Login Page. Once we register, if we go back and check the DB the new database would have been created.
Next, login and navigate to BlogPost. Once you are at the Index page, if you refresh the database schema we will see the BlogPost table has been re-created.

Strategy to keep up with rapid changes during prototyping

So far so good, we are fine with our automatic schema setup. However in prototyping stage, we will be adding/modifying and deleting schema elements. In such a case, CreateDatabaseIfNotExists becomes a hindrance because if you don’t drop the database manually, it will throw an InvalidOperationException complaining that the backend has changed.
exception-on-model-change
Thus to keep up with rapid changes in Schema during development, we change the strategy to DropCreateDatabaseIfModelChanges strategy
drop-and-create-strategy
Now when EF detects a change in Model, it will drop the DB and create the schema again. Remember NO data will be saved.

Connection String and Schema Names

If we look at the constructor of the MVC Tooling generated EfDeploySampleContext, we will notice a string parameter that corresponds to the name of the connection string in the web.config.
ef-deploy-sample-context-constructor
As a part of clean setup, it is recommended that the connection string be passed into DB context via the constructor.
Next important thing is to pass the schema name. We often take access to ‘dbo’ for granted, but some hosting providers do not provide us with dbo access. If we come across such a scenario, we have to do additional mapping between entities and the tables.
To handle these two configurations, we update our EfDeploySampleContext constructor by passing the connectionName and schemaName parameters. We then pass the connection name to the base DBContext and use the schema name to map our entities to the tables in DB explicitly.
updated-ef-deploy-sample-context
As highlighted above, the default constructor has been modified to take in the two parameters and a guard clause ensures that if the parameters are not present, the instantiation throws an exception.
To match the above changes, we have to update our controllers so that we can pass these new values. To keep the schema Name configurable, we will add a key in the AppSetting section of our Web.Config and pass that value into the DbContext’s constructor.
Our DbContext initialization code in the controllers would then look as follows
changes-to-db-context-instantiation

Going Live

With all the above changes we are almost set to go live. The only two things remain
1. Update the DropCreateDatabaseIfModelChanges strategy back to null. This way EntityFramework will not try to make any changes to the DB
set-initializer-strategy-to-null
2. Next use Management Studio Tasks to extract the DB Schema as a SQL script.
a. Launch the Generate Scripts Wizard
launch-generate-script-wizard
b. Choose Objects and Select all Tables/objects. Do not select the ‘Script entire database and all database objects’ option. It tries to create a new database. This is not what we want. Most hosting providers will not give us ‘create’ or ‘drop’ database rights.
generate-script-selected-tables
c. Save the Script
generate-script-save-files

Deploying to AppHarbor

Details of how to deploy code on AppHarbor have been explained in this article . We will look at the SQL Server setup more closely here.
- In AppHarbor, you have to create an application and select SQL Server as an AddOn. Most hosting providers will give you a connection string or server-name, user-name and password combination. AppHarbor provides you with the same.
  • Once the Add-On is installed you have access to one Database that you can access using the ‘Go To SQL Server’ link.
appharbor-go-to-sql-server
- The configuration page looks as follows
appharbor-database-config
- In the ‘ConnectionString alias’ setting, edit the alias and provide the same name as the connection string in our web.config i.e. EfDeploySampleContext
  • This actually creates a web.config transform that replaces the Web.config’s connection string setting with this database’s connection string, when the code is deployed on AppHarbor
  • If you are not deploying to AppHarbor and do not have a transform defined, make sure you update your connection string manually before deploy.
- Connect to the server using the above connection information using the SQL Server Management Studio.
- Open the SQL Script (remove the ‘using [dbname]’ if required) and execute it on the server. That’s it. Done.
- Deploy the code and if AppHarbor does not throw an error, your application is now live. You are good till the next upgrade cycle comes across. We will look at EF Migrations in an upcoming  article to handle that scenario.

Conclusion

To conclude, we saw how with a little bit of self-enforced processes we can easily migrate our EF prototypes to production environment. However, the real challenges in a production deployment are the updates and revisions and further schema changes. We will look into EF Migrations in the future that will help us resolve that scenario as well.
The code Repository is at https://github.com/dotnetcurry/EFDeploySample and you can alsodownload the Zip file

Thursday, 10 July 2014

cassandra Monitoring.


Guide to Cassandra Thread Pools


This guide provides a description of the different thread pools and how to monitor them. Includes what to alert on, common issues and solutions.

Concepts

Cassandra is based off of a Staged Event Driven Architecture (SEDA).  This separates different tasks in stages that are connected by a messaging service.  Each like task is grouped into a stage having a queue and thread pool (ScheduledThreadPoolExecutor more specifically for the Java folks).  Some stages skip the messaging service and queue tasks immediately on a different stage if it exists on the same node.  Each of these queues can be backed up if execution at a stage is being over run.  This is a common indication of an issue or performance bottleneck.To demonstrate take for example a read request:
blog

Note: For the sake of this example the data exists on a different node then the initial request, but in some scenarios this could all be on a single node.  There are also multiple paths a read may take.  This is just for demonstration purposes.
  1. NODE1> The request is queued from node1 on the ReadStage of node2 with the task holding a reference to a callback
  2. NODE2> The ReadStage will pull the task off the queue once one of its 32 threads become available
  3. NODE2> The task will execute to and compute the resulting Row, which will be sent off to the RequestResponse stage of node1
  4. NODE1> One of the four threads in the RequestResponseStage will process the task, and pass the resulting Row to the callback referenced from step 1 which will return to the client.
  5. NODE1> It will also possibly kick off an async read repair task on node1 to be processed in the ReadRepairStage once one of its four threads become available
The ReadRepairStage and how long it takes to process its work is not in the feedback loop from the client.  This means if the rate of reads occurs at a rate higher than the rate of read repairs the queue will grow until it is full without the client application ever noticing.  At the point of the queue being full (varies between stages, more below) the act of enqueueing a task will block.
In some stages the processing of these tasks are not critical.  These tasks are marked as “DROPPABLE” meaning the first thing the stage will do when executing it is to check if its exceeded past a timeout from when it was created.  If the timeout has passed it will throw it away instead of processing.

Accessing metrics

TPStats

For manual debugging this is the output given by
nodetool tpstats

Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         0         0         113702         0                 0
RequestResponseStage              0         0              0         0                 0
MutationStage                     0         0         164503         0                 0
ReadRepairStage                   0         0              0         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0              0         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0              0         0                 0
MemoryMeter                       0         0             35         0                 0
MemtablePostFlusher               0         0           1427         0                 0
FlushWriter                       0         0             44         0                 0
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              1         0                 0
commitlog_archiver                0         0              0         0                 0
InternalResponseStage             0         0              0         0                 0
HintedHandoff                     0         0              0         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0
The description of the pool values (Active, Pending, etc) is defined in the table below. The second table lists dropped tasks by message type, for more information on that read below.

JMX

These are all available via JMX as well in:
org.apache.cassandra.request:type=*
and
org.apache.cassandra.internal:type=*
with the attributes (relative to tpstats) being:
MBean Attributetpstats nameDescription
ActiveCountActiveNumber of tasks pulled off the queue with a Thread currently processing.
PendingTasksPendingNumber of tasks in queue waiting for a thread
CompletedTasksCompletedNumber of tasks completed
CurrentlyBlockedTasksBlockedWhen a pool reaches its max thread count (configurable or set per stage, more below) it will begin queuing until the max size is reached.  When this is reached it will block until there is room in the queue.
TotalBlockedTasksAll time blockedTotal number of tasks that have been blocked

 Metrics Reporters

Cassandra has pluggable metrics reporting capabilities you can read more about here

Droppable Messages

The second table in nodetool tpstats displays a list of messages that were DROPPABLE. These ran after a given timeout set per message type so was thrown away. In JMX these are accessible via org.apache.cassandra.net:MessagingService or org.apache.cassandra.metrics:DroppedMessage. The following table will provide a little information on each type of message.
Message TypeStageNotes
BINARYn/aThis is deprecated and no longer has any use
_TRACEn/a (special)Used for recording traces (nodetool settraceprobability) Has a special executor (1 thread, 1000 queue depth) that throws away messages on insertion instead of within the execute
MUTATIONMutationStageIf a write message is processed after its timeout (write_request_timeout_in_ms) it either sent a failure to the client or it met its requested consistency level and will relay on hinted handoff and read repairs to do the mutation if it succeeded.
COUNTER_MUTATIONMutationStageIf a write message is processed after its timeout (write_request_timeout_in_ms) it either sent a failure to the client or it met its requested consistency level and will relay on hinted handoff and read repairs to do the mutation if it succeeded.
READ_REPAIRMutationStageTimes out after write_request_timeout_in_ms
READReadStageTimes out after read_request_timeout_in_ms. No point in servicing reads after that point since it would of returned error to client
RANGE_SLICEReadStageTimes out after range_request_timeout_in_ms.
PAGED_RANGEReadStageTimes out after request_timeout_in_ms.
REQUEST_RESPONSERequestResponseStageTimes out after request_timeout_in_ms. Response was completed and sent back but not before the timeout

Stage Details

Below lists a majority of exposed stages as of 2.0.7, but they tend to be a little volatile so if they do not exist don’t be surprised.  The heading of each section is correlates with the name of the stage from nodetool tpstats. The notes on the stages mostly refer to the 2.x data but some references to things in 1.2. 1.1 and previous versions are not considered at all in making of this document.
Alerts; given as a boolean expression, are intended as a starting point and should be tailored to specific use cases and environments in the case of many false positives.
When listed as “number of processors” its value is retrieved first by the system property “cassandra.available_processors” which can be overridden by adding a
-Dcassandra.available_processors
to JVM_OPTS. Falls back to default of Runtime.availableProcessors

Index

ReadStage

Performing a local read. Also includes deserializing data from row cache.  If there are pending values this can cause increased read latency.  This can spike due to disk problems, poor tuning, or over loading your cluster.  In many cases (not disk failure) this is resolved by adding nodes or tuning the system.
JMX beans:org.apache.cassandra.request.ReadStage
org.apache.cassandra.metrics.ThreadPools.request.ReadStage
Number of threads:concurrent_reads (default: 32)
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

RequestResponseStage

When a response to a request is received this is the stage used to execute any callbacks that were created with the original request
JMX beans:org.apache.cassandra.request.RequestResponseStage
org.apache.cassandra.metrics.ThreadPools.request.RequestResponseStage
Number of threads:number of processors
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

MutationStage

Performing a local including:
  • insert/updates
  • Schema merges
  • commit log replays
  • hints in progress
Similar to ReadStage, an increase in pending tasks here can be caused by disk issues, over loading a system, or poor tuning. If messages are backed up in this stage, you can add nodes, tune hardware and configuration, or update the data model and use case.
JMX beans:org.apache.cassandra.request.MutationStage
org.apache.cassandra.metrics.ThreadPools.request.MutationStage
Number of threads:concurrent_writers (default: 32)
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

ReadRepairStage

Performing read repairs. Chance of them occurring is configurable per column family with read_repair_chance. More likely to back up if using CL.ONE (and to lesser possibly other non-CL.ALL queries) for reads and using multiple data centers. It will then be kicked off asynchronously outside of the queries feedback loop, demonstrated in the diagram above. Note that this is not very likely to be a problem since does not happen on all queries and is fast providing good connectivity between replicas. The repair being droppable also means that after write_request_timeout_in_ms it will be thrown away which further mitigates this. If pending grows attempt to lower the rate for high read CFs:
ALTER TABLE column_family WITH read_repair_chance = 0.01;
JMX beans:org.apache.cassandra.request.ReadRepairStage
org.apache.cassandra.metrics.ThreadPools.request.ReadRepairStage
Number of threads:number of processors
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

ReplicateOnWriteStage

CounterMutation in 2.1, also counters changing dramatically so post 2.0 should consider this obsolete. Performs counter writes on non-coordinator nodes and replicates after a local write. This includes a read so can be pretty expensive. Will back up if the rate of writes exceed the rate that the mutations can occur. Particularly possible with CL.ONE and high counter increment workloads.
JMX beans:org.apache.cassandra.request.ReplicateOnWriteStage
org.apache.cassandra.metrics.ThreadPools.request.ReplicateOnWriteStage
Number of threads:concurrent_replicates (default: 32)
Max pending tasks:1024 x number of processors
Alerts:pending > 15 || blocked > 0

GossipStage

Post 2.0.3 there should no longer be issue with pending tasks. Instead monitor logs for a message:
Gossip stage has {} pending tasks; skipping status check (no nodes will be marked down)
Before that change, in particular older versions of 1.2, with a lot of nodes (100+) while using vnodes can cause a lot of cpu intensive work that caused the stage to get behind.
Been known to of been caused with out of sync schemas. Check NTP working correctly and attempt nodetool resetlocalschema or the more drastic deleting of system column family folder.
JMX beans:org.apache.cassandra.internal.GossipStage
org.apache.cassandra.metrics.ThreadPools.internal.GossipStage
Number of threads:1
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

AntiEntropyStage

Repairing consistency. Handle repair messages like merkle tree transfer (from Validation compaction) and streaming.
JMX beans:org.apache.cassandra.internal.AntiEntropyStage
org.apache.cassandra.metrics.ThreadPools.internal.AntiEntropyStage
Number of threads:1
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

MigrationStage

Making schema changes
JMX beans:org.apache.cassandra.internal.MigrationStage
org.apache.cassandra.metrics.ThreadPools.internal.MigrationStage
Number of threads:1
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

MemtablePostFlusher

Operations after flushing the memtable. Discard commit log files that have had all data in them in sstables. Flushing non-cf backed secondary indexes.
JMX beans:org.apache.cassandra.internal.MemtablePostFlusher
org.apache.cassandra.metrics.ThreadPools.internal.MemtablePostFlusher
Number of threads:1
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

FlushWriter

Sort and write memtables to disk. A vast majority of time this backing up is from over running disk capability. The sorting can cause issues as well however. In the case of sorting being a problem, it is usually accompanied with high load but a small amount of actual flushes (seen in cfstats). Can be from huge rows with large column names. i.e. something inserting many large values into a cql collection. If overrunning disk capabilities, it is recommended to add nodes or tune the configuration.
JMX beans:org.apache.cassandra.internal.FlushWriter
org.apache.cassandra.metrics.ThreadPools.internal.FlushWriter
Number of threads:memtable_flush_writers (1 per data directory)
Max pending tasks:memtable_flush_queue_size (default: 4)
Alerts:pending > 15 || blocked > 0

MiscStage

Snapshotting, replicating data after node remove completed.
JMX beans:org.apache.cassandra.internal.MiscStage
org.apache.cassandra.metrics.ThreadPools.internal.MiscStage
Number of threads:1
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

InternalResponseStage

Responding to non-client initiated messages, including bootstrapping and schema checking
JMX beans:org.apache.cassandra.internal.InternalResponseStage
org.apache.cassandra.metrics.ThreadPools.internal.InternalResponseStage
Number of threads:number of processors
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

HintedHandoff

Sending missed mutations to other nodes. Usually a symptom of a problem elsewhere so dont treat as root issue. Can use nodetool disablehandoff to prevent further saving. Can also use JMX (nodetool truncatehints) to clear the handoffs for specific endpoints or all of them with org.apache.cassandra.db:HintedHandoffManager operations. This must be followed up with repairs!
JMX beans:org.apache.cassandra.internal.HintedHandoff
org.apache.cassandra.metrics.ThreadPools.internal.HintedHandoff
Number of threads:max_hints_delivery_threads (default: 1)
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

MemoryMeter

Measures memory usage and live ratio of a memtable. The pending should not grow beyond size of number of memtables (front ended by set to prevent duplicates). This can take minutes for large memtables.
JMX beans:org.apache.cassandra.internal.MemoryMeter
org.apache.cassandra.metrics.ThreadPools.internal.MemoryMeter
Number of threads:1
Max pending tasks:231-1
Alerts:pending > column_family_count || blocked > 0

PendingRangeCalculator

Calculates the token ranges based on bootstrapping and leaving nodes. Instead of blocking this stage discards tasks when one already in progress so no value in monitoring
JMX beans:org.apache.cassandra.internal.PendingRangeCalculator
org.apache.cassandra.metrics.ThreadPools.internal.PendingRangeCalculator
Number of threads:1
Max pending tasks:1

commitlog_archiver

(Renamed CommitLogArchiver in 2.1) Executes a command to copy (or user defined command) commit log files for recovery. Read more at DataStax Dev Blog
JMX beans:org.apache.cassandra.internal.commitlog_archiver
org.apache.cassandra.metrics.ThreadPools.internal.commitlog_archiver
Number of threads:1
Max pending tasks:231-1
Alerts:pending > 15 || blocked > 0

AntiEntropySessions

Number of active repairs that are in progress. Will not show up until after a repair has been run so any monitoring utilities should be able to handle it not existing.
JMX beans:org.apache.cassandra.internal.AntiEntropySessions
org.apache.cassandra.metrics.ThreadPools.internal.AntiEntropySessions
Number of threads:4
Max pending tasks:231-1
Alerts:pending > 1 || blocked > 0

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...