Wednesday, July 20, 2011

What Is Sharding?

Sharding is the method MongoDB uses to split a large collection across several servers (called a cluster). While sharding has roots in relational database partitioning, it is (like most aspects of MongoDB) very different.

The biggest difference between any partitioning schemes you've probably used and MongoDB is that MongoDB does almost everything automatically. Once you tell MongoDB to distribute data, it will take care of keeping your data balanced between servers. You have to tell MongoDB to add new servers to the cluster, but once you do, MongoDB takes care of making sure that they get an even amount of the data, too.

Sharding is designed to fulfill three simple goals:


Make the cluster 「invisible.」
We want an application to have no idea that what it's talking to is anything other than a single, vanilla mongod. To accomplish this, MongoDB comes with a special routing process called mongos. mongos sits in front of your cluster and looks like an ordinary mongod server to anything that connects to it. It forwards requests to the correct server or servers in the cluster, then assembles their responses and sends them back to the client. This makes it so that, in general, a client does not need to know that they're talking to a cluster rather than a single server. There are a couple of exceptions to this abstraction when the nature of a cluster forces it.


Make the cluster always available for reads and writes.
A cluster can't guarantee it'll always be available (what if the power goes out everywhere?), but within reasonable parameters, there should never be a time when users can't read or write data. The cluster should allow as many nodes as possible to fail before its functionality noticeably degrades. MongoDB ensures maximum uptime in a couple different ways. Every part of a cluster can and should have at least some redundant processes running on other machines (optimally in other data centers) so that if one process/machine/data center goes down, the other ones can immediately (and automatically) pick up the slack and keep going. There is also the question of what to do when data is being migrated from one machine to another, which is actually a very interesting and difficult problem: how do you provide continuous and consistent access to data while it's in transit?


Let the cluster grow easily
As your system needs more space or resources, you should be able to add them. MongoDB allows you to add as much capacity as you need as you need it.

These goals have some consequences: a cluster should be easy to use (as easy to use as a single node) and easy to administrate (otherwise adding a new shard would not be easy). MongoDB lets your application grow—easily, robustly, and naturally—as far as it needs to.

Source of Information : OReilly Scaling MongoDB 2011
What Is Sharding?SocialTwist Tell-a-Friend
Digg Google Bookmarks reddit Mixx StumbleUpon Technorati Yahoo! Buzz DesignFloat Delicious BlinkList Furl

0 comments: on "What Is Sharding?"

Post a Comment