Key-Value Stores for Java

by Max Rohde,

One common theme in discussions of persistence is to be critical of the traditional SQL databases, which have been successfully used in business applications for decades. As alternative to these SQL databases, often so called NoSQL solutions are advocated. NoSQL, essentially, dramatically simplifies the kind of data, which can be managed by the database. Often, a simple key-value store is at the heart of the database. Such stores allow the user to store data in a similar way as it has long been known for the Map data structure. However, whereas the traditionally Maps have been used in-memory, e.g. to hold data temporarily, which fits on one system's memory, NoSQL key value stores are designed to persist and scale up to millions or billions of key-value records, distributed among potentially thousands of servers. As can be expected, in the Java space, there is not one all-encompassing NoSQL solution but various (mostly free and open source) offerings compete with varying features.

In the following I discuss two categories of implementations:

  1. Object Prevalence engines: which assure objects stored in memory are never lost and constantly backed up onto a hard disk.
  2. Persisted Databases engines: which allow to work with key-value databases, which are larger than the systems memory by persisting parts of them onto a hard disk (or another persistence backend).

Object Prevalence

Object Prevalence systems keep all records in-memory. However, they are designed to keep a synchronized persisted copy of all objects on the hard disk. In case of a system failure or reboot they therewith can easily restore the state before the interruption. Object prevalence systems are often used to speed up applications, for which all required data can be stored in memory (Villela, 1st of August 2002).

One prominent example of an object prevalence system is the open source solution Prevayler (http://prevayler.org). A significant limitation for such object prevalence systems is, of course, that the amount of data the application can work with is limited to the memory available on one server. To allow systems to scale, an object prevalence system can be distributed among various systems. A very well-performing open source solution for this purpose is hazelcast (http://www.hazelcast.com/).

EngineProject homeDescription
Prevaylerhttp://prevayler.orgLightweight solution to build prevalent systems.
Space4jhttp://www.space4j.org/Keeps an incremental record of all operations saved on disk in order to recreate a state in memory (but also has the option to create snapshots).
MegaMaphttp://megamap.sourceforge.net/Allows to work with Maps larger than the memory and persist these onto disk. However, not developed for a while and not fault tolerant.
hazelcasthttp://www.hazelcast.com/Allows to distribute objects held in memory among a cluster of physical systems (within a network or connected through WAN).

Persisted Databases

While object prevalence systems are arguably one of the best performing solutions to store the data of an application, there are many use cases, in which data is only rarely accessed and therewith not required to be available in memory. For these purposes key-value stores, which keep only a fraction of their data in the memory are best suited. Below a number of examples implementing this pattern. To get started quickly, I think jdbm2 is a good option, for large scale solutions, you might have to consider Berkely DB – but this might end up being a pricy pathway.

EngineProject homeDescription
jdbm2http://code.google.com/p/jdbm2/Persists a HashMap or TreeMap using Java Serialization.
mapdbhttps://github.com/jankotek/mapdbEvolution of JDBM 2 (was initially known as JDBM 4)
Banana DBhttp://people.apache.org/~kalle/bananadb/Persists a Map in a file. Potentially inefficient read operations.
BabuDBhttp://code.google.com/p/babudb/Persists key-value pairs of byte[] values.
Berkeley DBhttp://www.oracle.com/technetwork/ database/berkeleydbKey-value store solution from Oracle. Rather restrictive open-source licence.
JOAFIPhttp://joafip.sourceforge.net/Rather than providing a simple key-value store, JOAFIP attempts to dynamically persist an object tree to the disk (to manage object trees too large to fit into memory).

Resources

JDBM

"HIVE-1754 - Remove JDBM component from Map Join" Apache Hadoop has removed the JDBM dependency from their codebase due to poor performance. However, they seem to have replaced the persistence backed map with a map solely in memory. So they did not really find a better alternative for JDBM but rather removed the necessity for file-based persistence altogether. The old implementation of JDBM used by them can be found in the source repositories: e.g. BaseRecordManager in hive-0.5.0, or how it was used in HashMapWrappper also in hive-0.5.0.

"JDBM2 released" Blog post with some additional info concerning JDBM2

SQL Alternatives

For an SQL database with a very low footprint and high performance see: http://www.h2database.com/

Other Resources

"An introduction to object prevalence" (Villela, 1st of August 2002)

"Anti-RDBMS: A list of distributed key-value stores" (Jones, 19th of January 2009)

"NoSQL Services Available" (Menon, 29th of March 2011)

"The Jalapeño Persistence Library for Java" (intersystems.com)

"POJO Persistence" (on this blog)

Prevayler (http://prevayler.org)

hazelcast (http://www.hazelcast.com/)

Categories: java