Freiberufler, Java-Architekt und Java-Entwickler

Highly Scalable, Ultra-Fast and Lots of Choices

Blob Store

Your application relies on binary data that needs to be distributed to the application's users.

Think about web sites with plenty of large images. If the content of your web site changes regularly, the images often do so, too. Statically bundling the images with your application is not an option because of their size and because every image change would cause a re-deployment.

You need to dynamically store and publish huge amounts of binary data such as images.

Relational databases provide the data type BLOB to store binary data in tables. But if you keep many and potentially large BLOB entries in a relational databases, you quickly drive the database to its limits. In addition, relational databases typically cannot serve their content directly to end users (and you wouldn't want them to do so). So you need to stream the BLOB contents from the database through an application server, which puts load on two critical system components.

You could store binary data in a file system and set up a web server to make these files accessible. But large numbers of files and big files push many file systems to their limits. Moreover, in case of many simultaneous requests, the underlying storage system may soon become the bottleneck. Caching might help, but web servers are not optimized for caching large numbers of large files. In addition, neither file systems nor web servers provide means to manage files, e.g. they don't create unique keys.

To overcome the caching problem, you could employ a content delivery network, which acts as a global proxy. Requests for your files are handled by the content delivery network and not by your own system. But that would add another layer of complexity, which makes the system difficult to test. And you still need to manage the files in your file system first. Additionally, you must expect significant delays in distributing your files to all nodes of the content delivery network.

* * *

Therefore:

Choose a Blob Store, which is specialized to manage a large number of potentially large files.

Blob Store

A Blob Store keeps the complexity of actually storing the data away from you. At the same time, a Blob Store is optimized to provide the content to your client applications, typically via http URLs.

A Blob Store associates every file with a unique key, by which the file can be accessed. Keys can be defined by your application or they can be automatically generated. Blob Stores provide little search capabilities. You need to keep track of your binary data in your main application.

Setting up and running a Blob Store is not trivial. Therefore, Blob Stores are often used as a service. Some Blob Store products (or rather vendors) physically distribute your files around the globe. You still have a single point of access to manage your data from within your application. But accessing the files becomes a (geographically) local operation from many points in the world. A Blob Store (again: as a service from a vendor) may therefore avoid the need to use a dedicated content delivery network.

On the downside, Blob Stores are like an attic into which you store everything that does not fit elsewhere. After a while, you might lose track of what is inside and what is not. Blob Stores are typically bad at categorizing your data – you've only got an identifier. And because Blob Stores are specialized at binary data only, they do not provide a means to store all your data. Rather, they add complexity to your data storage architecture.

The main difference to Key/Value Stores is that Blob Stores are made to distribute binary data to users whereas Key/Value Stores are made to handle frequent but small application data changes.

Examples of Blob Stores are Amazon S3 and the Blob service of Windows Azure).

Back to the pattern overview