Rage posts about MongoDB are quite popular these days. Most of them are about poor performance on specific data sets, reliability and sharding issues. Some of those blog posts might be right, other are just saying that the most popular NoSQL solution didn't fit their needs.
This article is not one of those. While most of the posts focus on operations part, benchmarks and performance characteristics, I want to talk a little bit about MongoDB query interfaces. That's right - programming interfaces, specifically about node.js native driver but those are nearly identical across different platform drivers and Mongo-shell.
Disclaimer: I try hard not to hate on MongoDB. In fact I work with MongoDB every work day as part of my full-time job. I also take part in the development ofMinimongo, pure-JavaScript clone of MongoDB API to work with in-memory caches. There is no reason for me to mock Mongo other than warning everyone about its sharp edges. Most of these gotchas are found by David Glasser. This article assumes you are familiar with MongoDB's API.
1. Keys order in a hash object
Let's say you want to store a simple object literal:> db.books.insert({ title: "Woe from Wit", meta: { author: "A. Griboyedov", year: 1823 } });
> db.books.find({ meta: { year: 1823, author: "A. Griboyedov" } });
< No results returned
> db.books.find({ meta: { author: "A. Griboyedov", year: 1823 } });
< { _id: ..., title: "Woe from Wit", meta: { ... } }
The gotcha: the order of keys matters in MongoDB, i.e.
{ a: 1, b: 2 }
does not match { b: 2, a: 1 }
.Why does it happen: MongoDB uses a binary data format called BSON. In BSON, the order of keys always matters. Notice, in JSON an object is an unordered set of key/value pairs.
What about JavaScript? ECMA-262 left it as 'undefined'. In some browsers (usually old ones) the order of pairs is not preserved meaning they can be anything. Thankfully most modern browsers' JavaScript engines preserve the order (sometimes even in arrays), so we can actually control it from node.js code.
Read more about it at John Resig's blog.
The answer to this is to either always specify pairs in the canonical form (keys are sorted lexicographically) or just to be consistent across your code base.
Another workaround would be to use a different selector, specifying certain key-paths rather than comparing to an object literal:
> db.books.find({ 'meta.year': 1823, 'meta.author': 'A. Griboyedov' });
The gotcha: this behavior can be dangerous whenever you want to build a multi-key index.
> db.books.ensureIndex({ title: 1, 'meta.year': -1 });
title
would be higher than the priority ofmeta.year
field. This is important to the way MongoDB will lay out your data: Read more in docs.2. undefined, null and undefined
Anyone remembers those times when the behavior ofundefined
, null
and the relation was confusing? In JavaScript world those are two different values and they are not the same in a strict comparison undefined !== null
. However, they are equal in a non-strict comparison undefined == null
. Some people are very careful with them, others use them interchangeably. But the point is: you have two different but similar values in JavaScript.MongoDB brings it to the next level. The BSON spec defines
undefined
as "deprecated".Node.js node-native-driver for MongoDB doesn't implement it at all.
In the current version (2.4.8) the behavior shows that
null
and undefined
are treated as the same value.> db.things.insert({ a: null, b: 1 });
> db.things.insert({ b: 2 }); // the 'a' is undefined implicitly
> db.things.find({ a: null });
< { a: null, b: 1 }
< { b: 2 }
undefined
is just converted to null
by node driver but is restricted in mongo-shell.In the following code we will get the same result printed twice: all 3 objects.
// from node.js code with mongo/node-native-driver
db.things.insert({ a: null, b: 1 });
db.things.insert({ b: 2 });
db.things.insert({ a: undefined, b: 3 });
console.log(db.things.find({ a: null }).fetch())
console.log(db.things.find({ a: undefined }).fetch())
null
but we get all three objects as well.// from mongo-shell
> db.things.find({a: undefined});
< error: { "$err" : "can't have undefined in a query expression", "code" : 13629 }
> db.things.find({a: null});
< { "a" : null, "b" : 1, "_id" : "wMWNPm7zrYXTNJpiA" }
< { "b" : 2, "_id" : "RjrYvmZF5EukhpuAY" }
< { "a" : null, "b" : 3, "_id" : "kethQ2khbyfFjJ7Sa" }
undefined
tonull
but left the implicit one as is (which is expected really).The cool stuff happens when we insert an explicit
undefined
from mongo-shell:// from mongo-shell
> db.things.insert({ a: undefined, b: 4 });
> db.things.find({ a: null })
< { "a" : null, "b" : 1, "_id" : "wMWNPm7zrYXTNJpiA" }
< { "b" : 2, "_id" : "RjrYvmZF5EukhpuAY" }
< { "a" : null, "b" : 3, "_id" : "kethQ2khbyfFjJ7Sa" }
b=4
. Shouldn't undefined
match null
? Let's look at the new object:> db.things.find({ b: 4 });
< { "_id" : ObjectId("52ca134f3e47d3d91146f2b5"), "a" : null, "b" : 4 }
a
field is holding something looking like null
but doesn't match the null
from our selector.The gotcha: there are more than 2 values looking like
null
in MongoDB: null
,undefined
and undefined
inserted from mongo-shell that looks like null
in the shell but in reality matches the deprecated undefined
in BSON (type number six). The last one doesn't match null
from the selectors, first two match bothundefined
and null
. The absence of value also matches both.Read the original GitHub issue.
3. Soft limits, hard limits and no limits
Let's say you have a feed of items and you allow user to specify the number items to return. You would return the result of a query looking like this:db.items.find({ ... }).limit(N);
N
is supplied by user. Of course we want to be careful and restrict user up to 50 items, otherwise anyone in the Internet would be able to load our application server and the database simply by supplying a very large N
:function getItems (N) {
if (N > 50)
N = 50;
return db.items.find({}).sort({ year: 1 }).limit(N);
}
The gotcha: if user supplies
0
(zero) as a number of items he wants to get the MongoDB would take it as "give me everything".It is well documented but not obvious right away: zero means "no limit" to MongoDB. My guess is some code just treats all falsy values the same way:
undefined
, null
, 0
, absence of value - everything means "no limit".That's OK, we can treat
0
as a special case:function getItems (N) {
if (N > 50 || !N) // check if N is falsy ("no limit")
N = 50;
return db.items.find({}).sort({ year: 1 }).limit(N);
}
In reality something like
db.items.find().limit(-1000000000000)
can return a bazillion of items. It is hard to find the documentation about it but several month ago I have seen the description of this behavior in node.js driver's docs, it talked about "soft" and "hard" limits. I have no idea what does it mean.So the final version of our server-side method would look like this:
function getItems (N) {
if (N < 0) N = -N;
if (N > 50 || !N) // check if N is falsy ("no limit")
N = 50;
return db.items.find({}).sort({ year: 1 }).limit(N);
}
4. Special treatment for arrays
A lot of people don't know this "feature" but arrays are treated specially.> db.c.insert({ a: [{x: 2}, {x: 3}], _id: "aaa"})
> db.c.find({'a.x': { $gt: 1 }})
< { "_id" : "aaa", "a" : [ { "x" : 2 }, { "x" : 3 } ] }
> db.c.find({'a.x': { $gt: 2 }})
< { "_id" : "aaa", "a" : [ { "x" : 2 }, { "x" : 3 } ] }
> db.c.find({'a.x': { $gt: 3 }})
< Nothing found
Notably, it doesn't work for nested arrays:
> db.x.insert({ _id: "bbb", b: [ [{x: 0}, {x: -1}], {x: 1} ] })
> db.x.find({ 'b.x': 1 })
< { "_id" : "bbb", "b" : [ [ { "x" : 0 }, { "x" : -1 } ], { "x" : 1 } ] }
> db.x.find({ 'b.x': 0 })
< Nothing found
> db.x.find({ 'b.x': -1 })
< Nothing found
> db.z.insert({a:[[{b:1,c:2},{b:2,c:4}],{b:3,c:5},[{b:4, c:9}]]})
> db.z.find({}, {'a.b': 1})
< { "_id" : ObjectId("52ca24073e47d3d91146f2b7"), "a" : [ [ { "b" : 1 }, { "b" : 2 } ], { "b" : 3 }, [ { "b" : 4 } ] ] }
> db.z.insert({a: [[{x: "00"}, {x: "01"}], [{x: "10"}, {x: "11"}]], _id: "zzz"})
> db.z.find({'a.x': '00'})
< Nothing found
> db.z.find({'a.x': '01'})
< Nothing found
> db.z.find({'a.x': '10'})
< Nothing found
> db.z.find({'a.x': '11'})
< Nothing found
> db.z.find({'a.0.0.x': '00'})
< { "_id" : "zzz", "a" : [ [ { "x" : "00" }, { "x" : "01" } ], [ { "x" : "10" }, { "x" : "11" } ] ] }
> db.z.find({'a.0.0.x': '01'})
< Nothing found
> db.z.find({'a.0.x': '00'})
< { "_id" : "zzz", "a" : [ [ { "x" : "00" }, { "x" : "01" } ], [ { "x" : "10" }, { "x" : "11" } ] ] }
> db.z.find({'a.0.x': '01'})
< { "_id" : "zzz", "a" : [ [ { "x" : "00" }, { "x" : "01" } ], [ { "x" : "10" }, { "x" : "11" } ] ] }
> db.z.find({'a.0.x': '10'})
< Nothing found
> db.z.find({'a.0.x': '11'})
< Nothing found
> db.z.find({'a.1.x': '00'})
< Nothing found
> db.z.find({'a.1.x': '01'})
< Nothing found
> db.z.find({'a.1.x': '10'})
< { "_id" : "zzz", "a" : [ [ { "x" : "00" }, { "x" : "01" } ], [ { "x" : "10" }, { "x" : "11" } ] ] }
> db.z.find({'a.1.x': '11'})
< { "_id" : "zzz", "a" : [ [ { "x" : "00" }, { "x" : "01" } ], [ { "x" : "10" }, { "x" : "11" } ] ] }
> db.p.insert({a: [0], _id: "xxx"})
> db.p.find({'a': 0})
< { "_id" : "xxx", "a" : [ 0 ] }
> db.q.insert({a: [[0]], _id: "yyy"})
> db.q.find({a: 0})
< Nothing found
> db.q.find({'a.0': 0})
< Nothing found
> db.q.find({'a.0.0': 0})
< { "_id" : "yyy", "a" : [ [ 0 ] ] }
{ 'a.0.x': Y }
meaning the field x
of the first element of field a
must be Y
) may become very confusing as it depends on your data.5. $near geo-location operator
This one is simple. You have a collection of documents with a location field. Location field represents a geo-location. The trick is in two different types of locations MongoDB can index, each type has a slightly different API and a slightly different behavior.The first one looks like this:
db.c.find({
location: {
$near: [12.3, 32.1],
$maxDistance: 777
}
});
db.c.find({
location: {
$near: {
$geometry: {
type: "Point",
coordinates: [ 12.3, 32.1 ]
},
$maxDistance: 777
}
}
});
$maxDistance
is the sibling element of $near
in case of plain pairs and is a child in case of Geo-JSON.But there is more! Sometimes you can get the same point twice in the result set! To understand this we need to recall the previous gotcha about nested arrays. Consider this code:
> db.c.insert({ location: [[1, 2], [1, 0]] }); // inserting an array of two points
> db.c.ensureIndex({ location: "2d" });
> db.c.find({ location: { $near: [0, 0], $maxDistance: 500 } });
< { "_id" : ObjectId("52ca30ec3e47d3d91146f2b8"), "location" : [ [ 1, 2 ], [ 1, 0 ] ] }
< { "_id" : ObjectId("52ca30ec3e47d3d91146f2b8"), "location" : [ [ 1, 2 ], [ 1, 0 ] ] }
All these gotchas remind me the days when I first started coding in JavaScript. There are several corner cases, some of them work inconsistently across browsers, some of the features you never want to use, somewhere you want to be extra careful. All of those are well known in JavaScript land, but not so well in MongoDB land.
Almost every weird behavior listed here was found in the process of simulating MongoDB in the project called Minimongo, mostly by David Glasser.
This article will be updated as new weirdnesses come to mind.
Update of 1 April 2014: I talked about some of these issues and some new gotchas on SF Meteor Devshop, the recording of the talk is below. "Don't get bitten by Mungos (or Mongos)":
No comments:
Post a Comment