NoSQL, NoProblem (Not Really… but it’s still awesome)

If you do tech and you don’t know what NoSQL refers to, then you just aren’t trying.

HBase, Cassandra, MongoDB, Voldemort, Membase, SimpleDB, CouchDB … and those are just the ones I can name off the top of my head and felt like typing. Put simply, there are enough options to keep you busy for some time just trying to understand the differences between them.

Facebook’s announcement of an update to Messages brought NoSQL and the importance of those differences back to the top of my mind. Having followed the Facebook infrastructure a little, I was certain that they would either pimp out MySQL somehow, if they wanted to stay old-school, or extend Cassandra a bit to go new-school (it was created there after all, although maybe never deployed as widely as some other things). So I was a bit surprised to read that they instead went with HBase. In the end, it certainly is the right choice for their needs, as Kannan explains, but that’s not really my point here.

My point is that a lot of people think that NoSQL is the savior, the path to infinitely scalable key-based storage. While I’ll be the first to tell you that the NoSQL options out there can solve a lot of problems, and are a lot of fun to use, I’ll also caution restraint and patience. SQL based databases have had many years to mature and the result is that there are very standard ways to deploy and manage them in all but the most extreme situations. NoSQL isn’t there yet.

Over the past 18 months or so, I’ve used HBase, Cassandra, and Voldemort in production and near-production systems, and have just recently started experimenting with Membase. As I mentioned at the beginning of this post, the differences between each of these can sometimes be subtle, but they can also make or break your project. Cassandra and Voldemort each use a dynamo-esque eventually consistency model, but do it just differently enough to not be all that comparable (newest wins vs. vector clocks). On top of that, they offer dramatically different data structure support… Voldemort being simple key-value, and Cassandra offering a BigTable like column storage (along with the powerful, but initially very confusing, super-column families). HBase offers column based storage, similar to Cassandra (but absent the super column families), but has a completely different storage architecture that sits on top of the Hadoop Distributed File System.

The result of those subtle differences is that, unlike the SQL world, you can’t architect a NoSQL based system without having extreme confidence in the specific platform you’ve chosen. There is no reasonable migration path from Cassandra to Membase, for example. For you smart asses out there that are mapping column families to JSON objects in your head … by reasonable I mean comparable to switching between RDBMS, which are by and large identical to the applications that use them. Point being, for the most part, you are going all in.

Kannan mentions that Facebook worked on Messages for over a year, a good bid of that time likely testing different options and ultimately polishing HBase. Most of us don’t have that luxury and don’t have that kind of time, we just need a platform that works. So do your research, and be sure to think through all of the implications of which way you go… from scalability to operations to feature extensibility to your specific CAP needs.

With all of that said, I’ll take building a NoSQL system over a RDBMS system any day … it’s just plain fun.

For those of you that may be in the midst of looking into NoSQL options, or maybe battling one that’s keeping you up at night, here are some of my favorite links on the topic:

  1. A series of posts (stil ongoing, so check back often) comparing the major NoSQL options in a few major areas
  2. The white paper on Amazon’s Dynamo architecture
  3. Cassandra vs. HBase … getting out of date in some of the specifics, but the general arc is still right on
Lots Of Hats is a place where i'll write about pretty much anything I want to write about... from tech stuff, to my continual amazement of watching my kids grow up, to politics.

Enjoy,

Jeremy Pinkham