MongoDB Sharding

Messed around today with MongoDB sharding on version 1.2.2. It was pretty easy to setup. All I had to do was:

  1. Download MongoDB
  2. Do this setup
Here's the question that prompted me to try it: does MongoDB only fetch the necessary subset of chunks when doing range queries? The short answer is yes, it does. It was natural to assume it would, but I wanted to see it in action.

To test this, I created the test.people collection from step 2 above, then ran this with the mongo client:

for (i=0; i < 3000000; i++) { test.people.insert({name: i}) }

When that finished, I had three chunks.

> printShardingStatus(db.getSisterDB("config"))

--- Sharding Status ---
sharding version: { "_id" : ObjectId("4b7df95cc03e000000005d6b"), "version" : 2 }
shards:
{ "_id" : ObjectId("4b7df969c03e000000005d6c"), "host" : "localhost:10000" }
{ "_id" : ObjectId("4b7df96ec03e000000005d6d"), "host" : "localhost:10001" }
databases:
{ "name" : "admin", "partitioned" : false, "primary" : "localhost:20000", "_id" : ObjectId("4b7df969b90f0000000056aa") }
my chunks
{ "name" : "test", "partitioned" : true, "primary" : "localhost:10001", "sharded" : { "test.people" : { "key" :
{ "name" : 1 }, "unique" : true } }, "_id" : ObjectId("4b7df982b90f0000000056ab") }
my chunks
test.people { "name" : { $minKey : 1 } } -->> { "name" : 0 } on : localhost:10001 { "t" : 126654
7599000, "i" : 4 }
test.people { "name" : 0 } -->> { "name" : 2595572 } on : localhost:10001 { "t" : 1266547599000,
"i" : 2 }
test.people { "name" : 2595572 } -->> { "name" : { $maxKey : 1 } } on : localhost:10000 { "t" :
1266547599000, "i" : 5 }

You'll see that three chunks exist above. Two are on one shard, one is on the other. The important point to notice is that one is for [0, 2595572) and another for [2595572, maxkey). I'm not sure why [minkey, 0) and [0, 2595572) wasn't just [minkey, 2595572), but that's something for another day. For the purposes of my range test, this suffices.

I then tried operations such as:

> db.people.find({ name: { $gt: 1, $lt: 3 } } )
{ "_id" : ObjectId("4b7df9c85a4800000000485d"), "name" : 2 }

> db.people.find({ name: { $gt: 2595573, $lt: 2595575 } } )
{ "_id" : ObjectId("4b7dfb8f903c000000003c92"), "name" : 2595574 }

> db.people.find({ name: { $gt: 2595570, $lt: 2595575 } } )
{ "_id" : ObjectId("4b7dfb8d5a4800000027e34f"), "name" : 2595572 }
{ "_id" : ObjectId("4b7dfb8f903c000000003c91"), "name" : 2595573 }
{ "_id" : ObjectId("4b7dfb8f903c000000003c92"), "name" : 2595574 }
{ "_id" : ObjectId("4b7dfb8d5a4800000027e34e"), "name" : 2595571 }

I watched the mongod output on these finds. The first two queries only hit one shard. The last query hit both shards. So MongoDB does in fact only query the necessary chunks even when doing range queries.