Key-Value Stores for Java

Max Rohde

24 Oct 2011 — 3 min read

One common theme in discussions of persistence is to be critical of the traditional SQL databases, which have been successfully used in business applications for decades. As alternative to these SQL databases, often so called NoSQL solutions are advocated. NoSQL, essentially, dramatically simplifies the kind of data, which can be managed by the database. Often, a simple key-value store is at the heart of the database. Such stores allow the user to store data in a similar way as it has long been known for the Map data structure. However, whereas the traditionally Maps have been used in-memory, e.g. to hold data temporarily, which fits on one system's memory, NoSQL key value stores are designed to persist and scale up to millions or billions of key-value records, distributed among potentially thousands of servers. As can be expected, in the Java space, there is not one all-encompassing NoSQL solution but various (mostly free and open source) offerings compete with varying features.

In the following I discuss two categories of implementations:

Object Prevalence engines: which assure objects stored in memory are never lost and constantly backed up onto a hard disk.
Persisted Databases engines: which allow to work with key-value databases, which are larger than the systems memory by persisting parts of them onto a hard disk (or another persistence backend).

Object Prevalence

Object Prevalence systems keep all records in-memory. However, they are designed to keep a synchronized persisted copy of all objects on the hard disk. In case of a system failure or reboot they therewith can easily restore the state before the interruption. Object prevalence systems are often used to speed up applications, for which all required data can be stored in memory (Villela, 1st of August 2002).

One prominent example of an object prevalence system is the open source solution Prevayler (http://prevayler.org). A significant limitation for such object prevalence systems is, of course, that the amount of data the application can work with is limited to the memory available on one server. To allow systems to scale, an object prevalence system can be distributed among various systems. A very well-performing open source solution for this purpose is hazelcast (http://www.hazelcast.com/).

Engine	Project home	Description
Prevayler	http://prevayler.org	Lightweight solution to build prevalent systems.
Space4j	http://www.space4j.org/	Keeps an incremental record of all operations saved on disk in order to recreate a state in memory (but also has the option to create snapshots).
MegaMap	http://megamap.sourceforge.net/	Allows to work with Maps larger than the memory and persist these onto disk. However, not developed for a while and not fault tolerant.
hazelcast	http://www.hazelcast.com/	Allows to distribute objects held in memory among a cluster of physical systems (within a network or connected through WAN).

Persisted Databases

While object prevalence systems are arguably one of the best performing solutions to store the data of an application, there are many use cases, in which data is only rarely accessed and therewith not required to be available in memory. For these purposes key-value stores, which keep only a fraction of their data in the memory are best suited. Below a number of examples implementing this pattern. To get started quickly, I think jdbm2 is a good option, for large scale solutions, you might have to consider Berkely DB – but this might end up being a pricy pathway.

Engine	Project home	Description
jdbm2	http://code.google.com/p/jdbm2/	Persists a HashMap or TreeMap using Java Serialization.
mapdb	https://github.com/jankotek/mapdb	Evolution of JDBM 2 (was initially known as JDBM 4)
Banana DB	http://people.apache.org/~kalle/bananadb/	Persists a Map in a file. Potentially inefficient read operations.
BabuDB	http://code.google.com/p/babudb/	Persists key-value pairs of byte[] values.
Berkeley DB	http://www.oracle.com/technetwork/ database/berkeleydb	Key-value store solution from Oracle. Rather restrictive open-source licence.
JOAFIP	http://joafip.sourceforge.net/	Rather than providing a simple key-value store, JOAFIP attempts to dynamically persist an object tree to the disk (to manage object trees too large to fit into memory).

Resources

JDBM

"HIVE-1754 - Remove JDBM component from Map Join" Apache Hadoop has removed the JDBM dependency from their codebase due to poor performance. However, they seem to have replaced the persistence backed map with a map solely in memory. So they did not really find a better alternative for JDBM but rather removed the necessity for file-based persistence altogether. The old implementation of JDBM used by them can be found in the source repositories: e.g. BaseRecordManager in hive-0.5.0, or how it was used in HashMapWrappper also in hive-0.5.0.

"JDBM2 released" Blog post with some additional info concerning JDBM2