Key-Value Stores for Java

One common theme in discussions of persistence is to be critical of the traditional SQL databases, which have been successfully used in business applications for decades. As alternative to these SQL databases, often so called NoSQL solutions are advocated. NoSQL, essentially, dramatically simplifies the kind of data, which can be managed by the database. Often, a simple key-value store is at the heart of the database. Such stores allow the user to store data in a similar way as it has long been known for the Map data structure. However, whereas the traditionally Maps have been used in-memory, e.g. to hold data temporarily, which fits on one system’s memory, NoSQL key value stores are designed to persist and scale up to millions or billions of key-value records, distributed among potentially thousands of servers. As can be expected, in the Java space, there is not one all-encompassing NoSQL solution but various (mostly free and open source) offerings compete with varying features.

In the following I discuss two categories of implementations:

  1. Object Prevalence engines: which assure objects stored in memory are never lost and constantly backed up onto a hard disk.
  2. Persisted Databases engines: which allow to work with key-value databases, which are larger than the systems memory by persisting parts of them onto a hard disk (or another persistence backend).

Object Prevalence

Object Prevalence systems keep all records in-memory. However, they are designed to keep a synchronized persisted copy of all objects on the hard disk. In case of a system failure or reboot they therewith can easily restore the state before the interruption. Object prevalence systems are often used to speed up applications, for which all required data can be stored in memory (Villela, 1st of August 2002).

One prominent example of an object prevalence system is the open source solution Prevayler (http://prevayler.org). A significant limitation for such object prevalence systems is, of course, that the amount of data the application can work with is limited to the memory available on one server. To allow systems to scale, an object prevalence system can be distributed among various systems. A very well-performing open source solution for this purpose is hazelcast (http://www.hazelcast.com/).

Engine Project home Description
Prevayler http://prevayler.org Lightweight solution to build prevalent systems.
Space4j http://www.space4j.org/ Keeps an incremental record of all operations saved on disk in order to recreate a state in memory (but also has the option to create snapshots).
MegaMap http://megamap.sourceforge.net/ Allows to work with Maps larger than the memory and persist these onto disk. However, not developed for a while and not fault tolerant.
hazelcast http://www.hazelcast.com/ Allows to distribute objects held in memory among a cluster of physical systems (within a network or connected through WAN).

Persisted Databases

While object prevalence systems are arguably one of the best performing solutions to store the data of an application, there are many use cases, in which data is only rarely accessed and therewith not required to be available in memory. For these purposes key-value stores, which keep only a fraction of their data in the memory are best suited. Below a number of examples implementing this pattern. To get started quickly, I think jdbm2 is a good option, for large scale solutions, you might have to consider Berkely DB – but this might end up being a pricy pathway.

Engine Project home Description
jdbm2 http://code.google.com/p/jdbm2/ Persists a HashMap or TreeMap using Java Serialization.
mapdb https://github.com/jankotek/mapdb Evolution of JDBM 2 (was initially known as JDBM 4)
Banana DB http://people.apache.org/~kalle/bananadb/ Persists a Map in a file. Potentially inefficient read operations.
BabuDB http://code.google.com/p/babudb/ Persists key-value pairs of byte[] values.
Berkeley DB http://www.oracle.com/technetwork/ database/berkeleydb Key-value store solution from Oracle. Rather restrictive open-source licence.
JOAFIP http://joafip.sourceforge.net/ Rather than providing a simple key-value store, JOAFIP attempts to dynamically persist an object tree to the disk (to manage object trees too large to fit into memory).

Resources

JDBM

HIVE-1754 – Remove JDBM component from Map Join” Apache Hadoop has removed the JDBM dependency from their codebase due to poor performance. However, they seem to have replaced the persistence backed map with a map solely in memory. So they did not really find a better alternative for JDBM but rather removed the necessity for file-based persistence altogether. The old implementation of JDBM used by them can be found in the source repositories: e.g. BaseRecordManager in hive-0.5.0, or how it was used in HashMapWrappper also in hive-0.5.0.

JDBM2 released” Blog post with some additional info concerning JDBM2

SQL Alternatives

For an SQL database with a very low footprint and high performance see: http://www.h2database.com/

Other Resources

“An introduction to object prevalence” (Villela, 1st of August 2002)

“Anti-RDBMS: A list of distributed key-value stores” (Jones, 19th of January 2009)

“NoSQL Services Available” (Menon, 29th of March 2011)

“The Jalapeño Persistence Library for Java” (intersystems.com)

“POJO Persistence” (on this blog)

Prevayler (http://prevayler.org)

hazelcast (http://www.hazelcast.com/)

A Quick Reflection on Exceptions in Java

Exceptions are a necessary evil of most software development endeavours. As much as we would like to design applications, which are equivalent to mathematical functions ‘give me one input and always I will trustfully return you the same output’, the real world IOExceptions, OutOfMemoryExceptions, DivideByZeroExceptions and their various evil cousins force us to consider a versatile and often unpredictable state often out of our control.

The Java programming language, for instance, fundamentally provides four ways to inform a caller of a method of an ‘exceptional’ state:

  • Return Values of methods allow passing information of an invalid state to the caller of the method. Often, a specific “null” value is used to denote “Sorry, I can’t do what you ask me to do, ’cause of some unknown condition”.
  • Checked Exceptions can be added to the signature of methods to indicate that these methods could not be executed as planned. In difference to using return values, Checked Exceptions allow to express richer descriptions of the failure conditions. For instance, an IOException can be cast if an unexpected error occurred while reading/writing data from a source other than the memory.
  • Unchecked Exceptions can be thrown anywhere and must not be declared as part of message signatures. Unchecked Exceptions where originally envisioned to capture unexpectable exceptions, for instance an OutOfMemoryException when the JVM runs out of memory.
  • Callbacks are usually not as prominently used in Java as they are in JavaScript or node.js etc. However, they can be a powerful tool in Java. In principle, rather than return the result of a method as a return value, an object is passed to the method and the method calls specific methods of this object depending on the result of the method. The GWT RPC mechanism is a nice example for this, where a call to the remote server results either in a method onSuccess(…) or onFailure(…) to be called.

Some Advantages and Disadvantages

Choice always comes with its challenges, so we remain with the question, which of these mechanisms to use to report unexpected states during the execution of a method to the consumer of the method. Below, a brief discussion of potential advantages and disadvantages of the types discussed so far:

Return Values

Reporting unexpected states during the execution of a method in a return value is usually not a very good idea. The reason for this is that interpreting the result of the method becomes significantly more complex for the caller of this method. For instance, if the method divide(x,y) would report the divide by 0 exception through the return value, it would not be possible to use a number-based type such as Integer as the return type. Instead, a type like IntegerOrException would have to returned, which complicates using the method greatly.

Checked Exceptions

Once propagated as innovative feature of the Java programming language, checked exceptions are now often portrayed as one of its most serious design flaws. Like reporting unexpected conditions in return values, defining checked exceptions significantly complicates consuming methods. This is particularly paramount since Java has adopted a quite verbose syntax for dealing with exceptions, which quickly undermines the elegance of any code fragment. Basically, any call to a method with checked exceptions quickly explodes from 1 LOC to 5 LOC (try … catch …).

Unchecked Exceptions

Unchecked Exceptions are the unfriendly brothers of checked exceptions. They cause an application to immediately stop working and displaying a lengthy error report. Although it seems unintuitive at first, to write applications which are constantly on the verge of total collapse (e.g. an unchecked exception is thrown), relying on unchecked exceptions can lead to surprisingly reliable applications. Moreover, it is also possible to explicitly handle unchecked exceptions (using the good ol’ try … catch …) to avoid a production system from crashing on unexpected exceptions. However, the number of such explicit ‘checkpoints’ for unchecked exceptions is usually far lower than for checked exceptions.

For me, unchecked exceptions actually come in two flavours: those caused by ‘throw new RuntimeException(…)’ statements and those cause by violated assertions (assert text !=null). The latter are specific in that these exceptions will most likely be thrown only in development environments. For production, all assertions can be ‘switched off’, which can increase execution speed and a reduction in ‘application breaking’ exceptions being thrown.

Callbacks

Callbacks, in my opinion, are an underused ‘feature’ of the Java programming language. The node.js folks are not too wrong claiming that any operation depending on external resources (remote server, file system, …) should be done in an asynchronous ways. This is enabled by callbacks, since it is undefined, when the return methods will be called. Callbacks are usually a good choice if the caller of a method needs to respond differently to different unexpected states. Was there an error reading from the filesystem or sending a call to the server, or both? In this case, callbacks allow for an elegant way to make the caller of a service aware of these states, which must be handled, since each state can be represented by one method on the callback object.

Conclusion: A User-Centric Perspective

It is difficult to say which is the ‘right’ way of handling unexpected states since each one has their own advantages and disadvantages. I think one could sometimes even make a case for checked exceptions. In general, I find it very helpful to aid my decisions by thinking of the type of the unexpected state as well as the nature of the user of the provided method.

Do you think the unexpected state is ‘impossible’?

Yes. Sometimes the program can be expected to work in a certain way and an unexpected condition is simply unthinkable. An example for such a condition would be one key being existent in a Map twice. In such a case, assertions are the way to go, assume that after the application is tested and goes in production such highly dysfunctional conditions should have been eradicated.

No. If the exception condition can be expected, assertions are not a good option. For instance, an error might be encountered while reading a file from the disk. Such exceptional circumstances are likely to happen in production environments. Therefore, I would use callbacks to inform the caller of the method of potential exceptions occurring during execution of the method.

Is the unexpected state caused by an improper use of your interface?

Yes. Every object, even those not explicitly extending an interface have the programmatic interface of the sum of methods it implements. Conceptually, there are certain rules how these methods need to be used above providing the right data types for the parameters. For instance, it is not sensible to insert an element into a set, which is already defined in this set. In this case, I prefer to use unchecked exceptions, which provide the user of your interface with a clear message of what is going wrong (e.g. ‘Element X cannot be inserted since it is already defined in the set’). Note that the preconditions in Google Guava are a nice tool for this purpose.

No. If the unexpected state is not cause by an improper use of your interface, the impossibility question listed above can be applied to determine wither to use assertions or callbacks.

Resources

Blog post “Programming antipatterns

 

JavaDoc Editor for eclipse

Formatting JavaDoc using plain HTML can be a troublesome and time-intensive experience. Today I installed the JDocEditor plugin for eclipse, which allows editing JavaDoc in eclipse using a small rich text editor.

Here, a quick evaluation, a few screenshots and a little getting started guide.

Evaluation

Good:

  • Free
  • Helps to reduce the hassle of dealing with line breaks and paragraphs in JavaDoc
  • Allows to compose lists and do simple formatting like using bold and italics

Not so good:

  • The editor has no native support for JavaDoc annotations such as @link, …
  • The editors handling of line breaks, paragraphs and basically any more sophisticated formatting can be unpredictable at times

Conclusion:

  • Good tool to enhance productivity in editing JavaDoc documentation with simple formatting.

Screenshots

The Editor:

Generated JavaDoc (rendered):

Generated JavaDoc (html):

Getting Started

  • Install using the update site http://www.certiv.net/updates
  • After the plug in has been installed, add the view of the plugin to your workspace as shown below

  • The view can be found under the category JavaDoc Editors/ JDocEditor

Resources

StackOverflow Discussion “JavaDoc editor for Eclipse to create formatted text”

Blog post on JDocEditor from 2005

Models Driven Development and Domain Specific Languages

The question of the right programming language is one that has always spurned much controversy. Likewise, the idea that one day we could develop software by simply ‘drawing’ expressive models has as many advocators as opponents.

The idea of domain specific languages could be one which helps us to advance both of these controversies. Both programming languages and traditional models (think UML) are ultimately both just abstractions, models to use another word. More specifically, most visual modelling techniques as well as programming languages follow well-defined (more or less) grammars. In a domain specific language we utilize the power of these grammars to solve problems in one domain.

A very good introduction as well as guide to domain specific languages (DSL) has just been released by Markus Völter:

MD*/DSL Best Practices V 2.0

Below a number of citations from this document on a number of matters:

General Purpose Programming Language vs. DSL

“[T]he ability to extend existing languages (such as it is possible with MPS, Spoofax, and to some extent with Xtext2), makes it possible to build domain specific languages as extensions of general-purpose languages. So instead of generating a skeleton from the DSL and then embedding 3GL code into it, one could instead develop a language extension, that inherits for example expressions and/or statements from the general-purpose base language. This makes a lot of sense: imagine the development of a language for asynchronous, reactive programming. In this case it is very useful to be able to inherit expressions from a general-purpose base language.”

Graphical vs. Textual Notation

“Things that are described graphically are easier to comprehend than textual descriptions, right? Not really. What is most important regarding comprehensibility is the alignment of the concepts that need to be conveyed with the abstractions in the language. A well-designed textual notation can go a long way. Of course, for certain kinds of information, a graphical notation is better: relationships between entities, the timing/sequence of events or some kind of signal/data flow.”

Tooling Matters

“Defining languages and notations is not enough per se – you have to provide good tool support for them, too. […]To increase usability, DSL editors need to be able to cope with wrong or incomplete models as they are entered by the users. Ideally, it should even be possible to persist them. Of course, as long as models are wrong or incomplete they cannot be processed any further. In the context of textual languages, this might mean that you design a somewhat “looser”, more tolerant grammar, and enforce correctness via constraints.”

Critique

I personally see especially great value in building DSLs around general purpose languages, such as Java, JavaScript, Groovy, Scala, etc. Some frameworks have already gone in long way in embedding DSLs in Java Syntax. See here for instance the Mirror Project. This project aspires to make it easier to interact with the Java Reflection API. The resulting calls come close to expressions in natural language.

new Mirror().on(target).set().field(fieldName).withValue(value);

Another example is the mocking framework Mockito. This framework utilizes a DSL to specify the behaviour of ‘mocked’ objects. For instance:

when(mockedList.get(anyInt())).thenReturn(“element”);

Another premier example is the hamcrest library, which is also mainly used to support unit tests.

assertThat(theBiscuit, is(equalTo(myBiscuit)));

Further ‘DSLs’ implemented in Java based on the hamcrest library are listed in the project, which use the hamcrest library.

Another example of a DSL directly implemented in Java is the JooQ library, which integrates SQL with the Java syntax. Below an example from the jOOQ project website:

create.selectFrom(BOOK)

.where(PUBLISHED_IN.equal(2011))

.orderBy(TITLE)

Many APIs are also implemented in a DSL-like fashion. For instance, the API for the NextReports framework (Dinca-panaitescu, 2011):

FileOutputStream stream = new FileOutputStream(“test.html”);

FluentReportRunner.report(report).connectTo(connection).withQueryTimeout(60).withParameterValues(createParameterValues()).formatAs(ReportRunner.HTML_FORMAT).run(stream);

Of course, there is still some way to go to improve those DSLs. Especially end users are likely to struggle with some intricacies of general programming languages. That said, I still believe that it would be easier to build powerful tools around the limitations of general purpose languages than to develop these ‘from scratch’ for other languages.

References

MD*/DSL Best Practices V 2.0 (Völter, 2011)

“Model Driven Development and Domain Specific Language Best Practices”, Jean-Jacques Dubray on Mar 28 2011 on infoq.com

Presentation “Real Software Engineering” by Glenn Vanderburg presented at Lone Star Ruby Conference 2010. (Advances that the code is the model)

“Is modelling about to overtake coding? I’m a happy SAP business consultant :)”, Thierry Crifasi on SAP Community Network posted on Dec. 13, 2010 (Advances that models should take over coding)