Why The Clock is Ticking for MongoDB

April 16, 2014

Last month, ZDNet published an interview with MongoDB CEO Max Schireson which took the position that the document databases, such as MongoDB, are better-suited to today's applications than traditional relational databases; the title of the article implies that the days of relational databases are numbered. But it is not, as Schireson would have us believe, that the relational database community is ignorant of or has not tried the design paradigms which he advocates, but that they have been tried and found, in many cases, to be anti-patterns. Certainly, there are some cases in which the schemaless design pattern that is perhaps MongoDB's most distinctive feature is just the right tool for the job, but it is also misleading to think that such designs must use a document store. Relational databases can also handle such workloads, and their capabilities in this area are improving rapidly.

Let's look at his example of entering an order into a database.  In this example, it is postulated that the order is split between 150 different relational tables, including an order header table, an order line table, an address information table and, apparently, 147 others.  Relational databases do encourage users to break up data across multiple tables in this way, a process called normalization.  But not for no reason.  Storing every order in one large document may be ideal if all access will be strictly by order number, but this is rarely the case.  When a user wants to run a report on all orders of one particular product, an index on the order line table can be used to efficiently find and retrieve just those order lines.  If all order data is lumped together, the user will be forced to retrieve the entirety of each order that contains a relevant order line - or perhaps even to scan the entire database and examine every order to see whether it contains a relevant order line.

Of course, like any good thing, normalization can be overdone.  Few database schemas are so heavily normalized that a simple order entry task touches 150 tables, but if yours is, you may well wish to consider denormalizing.  But you need not go so far as to denormalize completely, as Schireson appears to advocate.  Instead, you should determine what degree of normalization will best meet your current and future business needs.  Seasoned relational database professionals understand the trade-offs between normalization and denormalization, and can help companies make good decisions about when and how to normalize.  Schireson appears not to understand this trade-off, or else understands it but advocates for total denormalization anyway because that is the only paradigm his product can support.

Schireson also mentions another advantage of document stores: schema flexibility.  Of course, he again ignores the possible advantages, for some users, of a fixed schema, such as better validity checking.  But more importantly, he ignores the fact that relational databases such as PostgreSQL have had similar capabilities since before MongoDB existed. PostgreSQL's hstore, which provides the ability to store and index collections of key-value pairs in a fashion similar to what MongoDB provides, was first released in December of 2006, the year before MongoDB development began. True JSON capabilities were added to the PostgreSQL core as part of the 9.2 release, which went GA in September of 2012.  The 9.4 release, expected later this year, will greatly expand those capabilities. In today's era of rapid innovation, any database product whose market advantage is based on the format in which it is able to store data will not retain that advantage for very long.

The advantages of relational databases are not so easily emulated. Relational databases allow complex transactions that affect multiple records, synchronous commit so that each transaction is guaranteed to be durable on disk before the client is notified that the commit has succeeded, support not only for JSON but also for other complex datatypes such as XML and geospatial data types, mature query optimizers that not only support combining data from multiple indexes (which Schireson mentions as a forthcoming feature; PostgreSQL added that capability in 2005) but also the ability to combine data for multiple tables via joins.  While some MongoDB users may not require any of these features, many will, and I believe that MongoDB will find that adding these features to a product that natively supports JSON is much harder than adding JSON support to a product that already possesses these - and many other - enterprise features.

This is not to deny that MongoDB offers some compelling advantages.  Many users have found that they can get up and running on MongoDB very quickly, an area where PostgreSQL and other relational databases have traditionally struggled.  And the sharding capabilities of MongoDB are clearly useful to some users, but the process of scaling out is not as transparent as the documentation might imply, and sometimes goes badly wrong.  In the end, a large, complex-system requiring continuous uptime typically requires that the application developer and DBA work together and have very specific knowledge of which data is stored where.  Auto-sharding may succeed in hiding the complexity from the user in some cases, but it does not eliminate it.

In short, I don't expect MongoDB, or any similar product, to spell the end of the relational database.  Rather, I think it's likely that PostgreSQL and other database engines will continue to innovate, providing many of the features that have caught the imagination of developers who are now choosing NoSQL engines; and that NoSQL systems will struggle to add features which relational databases have had for years.

Share this

More Blogs