Highly Scalable, Ultra-Fast and Lots of Choices

A Pattern Approach to NoSQL

The main motivation to evaluate a non-relational database system often comes from non-functional requirements on performance and scalability that your favorite relational database system cannot support, at least not at reasonable cost.

NoSQL products promise to scale structured data storage beyond the limits of relational database systems. While this has already proven true for quite a number of applications, the ability to serve a huge number of concurrent users and great volumes of data comes at a price.

First of all, the term “NoSQL” encompasses a variety of approaches to store and retrieve data. Coming from the world of relational database systems, you will quickly realize that standards such as SQL do not exist (at least not yet). In fact, many NoSQL products have little in common other than that they are different from relational databases.

Fortunately, there are some commonalities among groups of NoSQL databases. The goal of this paper is to clearly distinguish the different types of NoSQL data stores. In reality, there are products that combine multiple approaches. For an introduction into the world of non-relational database systems, this paper emphasizes the individual advantages and drawbacks of every approach.

This paper does not go into details to explain advanced features. Neither does it contain an overview of the currently available NoSQL products since these products will change quickly. This paper rather aims at general insights into NoSQL data stores, which will hopefully remain valid for a longer period.

The data store types are described as patterns. The pattern approach has been taken to put emphasis on the question why you would choose a specific data store type. In other words, you will get to know what problem you should have so that a specific data store type is an appropriate solution.

If you want to quickly skim over the paper, just read the highlighted paragraphs. The paragraphs in italic explain the context in which the patterns can be applied. The paragraphs in bold face contain the main problem and solution statements. The paper ends with a couple of things to be aware of when introducing a NoSQL database for the first time and some final remarks.

The most basic concept of non-relational data storage is the Key/Value Store. Such a database stores little-structured data that is accessible by their keys only. Several variants exist that form groups of their own: Blob Stores are specialized to hold large binary data. Column Family Stores persist values of the same type closely together for better analyzing capabilities. Document Stores provide means to easily create rich domain models. Graph Databases rely on a different concept, focusing on the relations between data entries.

The patterns

A couple of things to be aware of

Although NoSQL databases are in fashion nowadays, the decision how to store an application's data is difficult. And there are more obstacles to be aware of.

A NoSQL database enforces you to learn a technology that often differs significantly from a relational database. You need to quickly build up knowledge or spend money on support or for training sessions. As most of the NoSQL product are quite new and few experienced developers exist, expect difficulties in finding appropriate assistance.

Even if the developers in your team are eager to learn new technologies, errors and wrong decisions that hurt the project's progress will happen. Do not assume that such a technological change will happen smoothly. Developers need to learn new APIs, administrators need to learn new tools and analysts need to understand the implications on domain modeling.

While you gain many powerful new features, you probably also lose features that you relied on so far. The implications of the loss of these features (such as referential integrity or transactional support) might not be fully understood at the beginning.

You may also lose the power of integration libraries that, for example, simplify the creation of an object-oriented domain layer on top of a non-object-oriented data store. The available libraries to integrate a specific product into your development environment (i.e. programming language, IDE, etc.) may lack all but the most essential support.

If you are part of a company where the development of an application is strictly separated from the operation of the application, you need to approach the operations team early. These people are responsible to keep the application running and they also need to learn and employ new tools to setup, monitor and backup the data store. They may even veto the introduction of a technology that is yet unknown to them.

You might be used to set up and manage a production environment where server applications are run on powerful, non-clustered machines. Do not underestimate the complexity of handling and managing server clusters – both in terms of software processes and hardware. For example, if you create a cluster based on commodity hardware, you have to accept and cope with failures of all core components.

Check out early how support is given for your chosen product. Many NoSQL products are supported by their developers on mailing lists or in forums. If severe problems happen (both at development time and in production), you may find out that no one feels responsible to help you solve your problem. Commercial support may not be available as you may be used to.

Final remarks

In general, you have little choice but to employ a NoSQL data store if the expected size of persistent data or the number of simultaneous requests exceed the capabilities of your relational database system. However, even if a relational database might still work for your project, NoSQL data stores provide value. So you could still decide on taking a NoSQL product; you just need to be aware of the consequences.

NoSQL databases provide a wide range of solutions when you need highly scalable data storage. But there is no all-in-one solution available that promises to replace the currently dominating relational databases systems. Rather, most NoSQL databases are best suited to address specific problems and use cases (see, for example, Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs Membase vs Neo4j).

While the data storage type of a NoSQL product is an important decision criterion, there are more criteria to consider. The CAP theorem allows different tradeoffs to make, i.e. whether a product restricts the consistency, availability or partition tolerance of the system. Furthermore, different products support different clustering and replication strategies that may affect an application's general architecture. All of these criteria (and several more) are out of scope of this paper and deserve papers of their own.

The term polyglot persistence expresses the fact that a single data storage may not suit all needs of an application. Instead, an application might encompass several different approaches to data persistence. To give an example: while you would probably keep financial records in a relational database, catalog data that describes a variety of products may be better kept in a Document Store. Session information of the users of a web site may be best stored in a Key/Value Store. Data that tracks the users' behavior for latter analysis is a candidate to be kept in a Column Family Store. Add a Blob Store for binary data such as images and a Graph Database to model and analyze the relations between your users and you've got a very rich data persistence landscape.

Acknowledgments

Cyrille Martraire gave a lot of constructive and valuable feedback as shepherd for this paper previous to the conference EuroPLoP 2012. During the conference, the paper was workshopped by Veli-Pekka Eloranta, Ioanna Lytra, Christopher Preschern and Vallidevi Krishnamurthy who provided even more suggestions for improvement.