Friday 20 June 2014

CQL programming series - WHERE clause

 Breaking Down the CQL Where Clause blog posting was created by Ike Walker. To view more postings by Ike, check out the Mechanics of Flite blog here.

When explaining the differences between CQL and SQL I often start by rattling off a list of things
you can’t do in CQL. The list usually starts something like this:
  1. No joins
  2. No GROUP BY
  3. No arbitrary WHERE clauses
  4. No arbitrary ORDER BY clauses
The first 2 items on the list are fairly obvious. Joins and GROUP BY are not supported at all.
But what about the WHERE clause? SQL supports a whole bunch of operators and predicates in
the WHERE clause, and allows filtering on non-indexed columns. The CQL WHERE clause only works
for indexed columns, specifically columns in the primary key (both the partition key and the clustering
columns), and columns that have secondary indexes on them. Multiple predicates can only be
connected with AND, not OR, only a few operators are supported, and all of the operators have
restrictions. The Datastax CQL documentation describes what the WHERE clause supports.
Here’s what it says:
I mostly agree with that language, but I think this line is wrong:
I’d say that the = and IN operators are restricted, for example you can only use IN on the last column in the partition key. More on that later.
From my experience, here’s the botttom line of which operators are supported for which columns in the CQL WHERE clause:
  1. Partition key columns support the = operator
  2. The last column in the partition key supports the IN operator
  3. Clustering columns support the =, >, >=, <, and <= operators
  4. Secondary index columns support the = operator
Read on for examples.

1. Partition key columns support the = operator

I’ll borrow a table from Patrick McFadin’s post about time series data modeling for my examples:
Since this table has a composite partition key on (weatherstation_id,date) I need to include both of those columns if I want to use the = operator.
So this query is allowed:
But these queries are not allowed:

 

2. The last column in the partition key supports the IN operator

For single column partition keys, the IN operator is allowed without restriction. I’ll use a table with a single column partition key to illustrate that:
For composite partition keys, I have to use the = operator on the first N-1 columns of the partition key in order to use the IN operator on the last column. So this query is allowed:
But these queries are not allowed:

 

3. Clustering columns support the =, >, >=, <, and <= operators

The clustering columns support lots of operators, with the caveats mentioned in the documentation snippet I posted earlier.
The simplest case is to use the = operator for the partion key and the clustering columns, like this:
I can also use the = or IN operator on the partition key and do a range scan on the clustering columns:
Regardless of the operator, if I try to filter on the clustering columns only, I’ll get an error if I don’t use the ALLOW FILTERING command. For example:
If I add the ALLOW FILTERING command then I can use any operator on the clustering columns, but the query may be very slow and could eventually time out.
So these queries are all allowed:
But sometimes they may time out, like this:

 

4. Secondary index columns support the = operator

For the secondary index examples I’ll add a secondary index to the un-indexed temperature column:
Now I can filter on the temperature column with the = operator:
I can also use a secondary index in conjunction with the = operator on the partion key and any supported operator on the clustering columns. For example:

No comments:

Post a Comment

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...