The Secret of Uncommonness

When writing only paraphrases words, phrases, ideas and concepts familiar to us, we often perceive it as boring and dull. If, on the other hand, writing consists mostly of words, phrases, ideas and concepts that are unfamiliar to us, we often perceive it as complex, difficult or even incomprehensible.

The secret of uncommonness therefore must be found in right balance between what is familiar and what is not. The following rules of thumb can guide in finding such a balance:

Begin with the Common and End with the Uncommon

A text is often easier to understand when it begins with familiar, easy and common concepts and introduces more complex and unfamiliar concepts afterward (Booth, Williams, and Colomb, 2003).

Use the Uncommon Sparsely

Think of the uncommon as some precious extra-glitter for your writing – but extra glitter which comes at a price. Since, usually, an uncommon word or concept needs to be couched in careful explanation. The more uncommon the more explanation might be required, which increases the length of your text.

Use the Uncommon to Emphasize Concepts of Particular Importance

The uncommon will stand out from your writing, and it is what the reader is most likely to remember. This makes using uncommon words and ideas to accompany ideas of particular importance an interesting choice. However, note that the key idea here should be to accompany rather than to express these ideas; since very important ideas should always be explained in the simplest way possible.

Using uncommon words, phrases and ideas is one of the most powerful tool a writer can wield in giving writing its final polish. However, like all powerful tools, it should be applied with caution. Understandability should, in my opinion, always be the key objective of most forms of writing and the uncommon easily gets in the way of this goal.

Read What You Write

Unquestionably there is value in the process of writing in itself; previously muddy ideas are clarified and new ideas emerge by bringing them on paper.

However, writing is arguably more valuable if what is written is also read (and then, hopefully, refined). Unfortunately, finding readers in our busy and information-saturated world is nothing but easy. Many writers, for instance, spend a considerable time building and connecting with an audience; for instance through book tours, presentations and social media.

Although we owe it to our efforts to publicize our creations, this might not be what everybody enjoys. I, for once, am not very comfortable with the concept of self-promotion.

Luckily, there is one reader whose attention we always can be assured of: ourselves. If you write and your writing is not read by an audience of millions, little fault falls upon you. But if you write and even you yourself do not read your writing, then why did you bother writing it in the first place?

Unfortunately, at least for me, it is often difficult to remind myself of reading what I have written myself.

One strategy to assure to overcome these difficulties is to create and/or assign a ‘reading contexts’ for any piece of writing created. Possible reading contexts could be the following:

  • A specific time; for instance, next Sunday at 3 pm.
  • An activity; for instance, before continuing my research on the Huns.
  • A resource; for instance, next time I access the document about architecture in the middle ages.

These contexts motivate us and ensure that what we have written gets read at the very least by ourselves. If this works, you could then venture to extend your readership to 2.

When am I Ready to Start Writing: Preparedness of Knowledge and Style

One of the most crucial prerequisites for writing, in my mind, is preparedness of knowledge and style.

Preparedness of knowledge entails that one has sufficient knowledge (or thoughts/ideas) of the topic one wants to write about. This seems pretty self-evident, since, as magical as the process of writing might be, we could never write say a biography of Napoleon without first gathering a set of facts about Napoleon’s life.

This preparedness of knowledge can be achieved with different depth; superficial preparedness makes you familiar with a couple of facts relevant to your writing. For instance, the time of birth of Napoleon, the place and duration of his conquests etc. While this superficial preparedness might be sufficient for certain types of writing; for instance to write a simple report about some obvious facts; it is most likely insufficient for creating exquisite writing on a complex subject. To achieve this, we will have to achieve deep preparedness. Such preparedness is grounded in a long process of collection of facts and perspectives from numerous sources.

While one easy distinguishing mark between superficial and deep preparedness lies in the time dedicated to preparation (with deep preparedness being more time-intensive), another important indicator is whether the knowledge we employ for the writing is conscious or subconscious. If we can grasps all the knowledge we require for the writing in our conscious mind (e.g. I will need to write about point A, B, C, …), we have most likely achieved only a superficial form of preparedness. If, in contrast, we have garnered so much knowledge about a topic that we cannot clearly understand the whole scale of our knowledge on the subject, we have more likely achieved a deeper form of preparedness. This preparedness enables us to write beyond our conscious plan of what we want to write. New twists and perspectives seem like they pop out of nowhere into our mind. However, they do not appear out of nothing but have grown from the extensive base of our knowledge on the subject.

Preparedness of style is a less obvious and more difficult to grasp form of preparedness. Preparedness of style assures that we have the technical skill to materialize our thoughts into well-formed words. Words, in particular, which meet the expectations of the specific type of writing we are engaged in.

As the most drastic example for preparedness of style think of mastery of language. If I never learnt a word of French in my life, I will not be able to produce even a simple shopping list written in French. However, I might well be able to do so, if I have taken some French classes and brushed up my vocabulary with a few words. My preparedness of style has increased, but would yet be insufficient to produce an essay for a French periodical.

While mastery of language is certainly an important dimension of preparedness of style, there are other, more subtle, yet possibly more practically relevant dimensions of this preparedness. The most important of these being the mastery of a particular writing style. Writing styles usually correlate with the type of output we seek to produce. For instance, the style of writing for a newspaper is quite different of the style of writing for an academic business journal, which will yet be different from writing a book about a new programming language.

The more we become familiar with a particular writing style, the more subtle differences will we be able to spot. There is, for instance, not just one style of writing for a newspaper, but one can distinguish slightly different writing styles for instance between The New York Times and The Wall Street Journal.

While preparedness of knowledge is certainly not easy to achieve, preparedness of style poses even greater challenges. While there are some style guides for languages or even for particular publications, what exactly makes up a ‘style’ is difficult to formalize; and therewith difficult to learn through traditional means.

Really the only way available is through practice; ideally with the help of a seasoned master of the style. This, for instance, is the supposed model to convey the style of academic writing, where the doctoral or master student is gently guided by a supervisor to pick up the tricks of the trade. However, given that no master craftsman is available and even if there is, it is required to learn by practicing; for instance, by critically examining the outputs one produces and matching them against one’s own ‘feeling’ of how well the writing matches with the style of the discipline.

Both preparedness of knowledge and preparedness of style are no qualities, which can be achieved in absolutes. Even the most knowledgeable expert or the most seasoned writing professional, will be able to find ways to improve their preparedness of knowledge and/or style. Try to reach the minimal level for both levels of preparedness after which you feel able to produce some coherent output. The output must not be great, not even be good. It really doesn’t matter. What matters is that you do what you want to learn. This is always the best form of practice, and in lucky circumstances even leaves you with respectable outputs for whichever goal you seek to achieve.

Introducing onedb: Connect Small Data in the Cloud

Data is growing bigger.

The increased scale of contemporary applications is often measured in the currencies of terabytes and thousands of transactions per second. An amazing array of tremendous new and old technologies help in dealing with this increased scale: reaching from clouds in various shapes to scalable NoSQL databases and even emerging, asynchronous programming paradigms.

However, while we get better at handling large quantities of data in high reliability involving dazzling numbers of concurrent transactions, the new mountains of data bring challenges beyond data processing and storage with them. In particular, big data is not smart data. Indeed, making sense of big data has been identified as one of the most important challenges lying ahead for technologist.

The problem to make sense of big data is not an easy one, especially since big data is, well, big and bulky and, generally, in its entirety difficult to comprehend.

This article introduces onedb. onedb is a free, cloud-based platform for connected small data. The platform currently consists of a free web service for Java Developers to store and connect data from any Java app. If you try out the service, you help me greatly in my studies!

In this article, I will first give a brief introduction explaining the background and motivation for onedb. I will then describe the various design goals of the service as well as how onedb helps to support smart and ‘small’ data.

Background and Motivation

When I joined graduate school at the University of Auckland some four years ago, I started working on the question of how an individual or a small team can organize their knowledge and information.

There are a number of interesting theories related to this question:

These theories point to a solution to the problem described in the introduction: how to make sense of big data. The answer that these theories provide is that if we strive to make big data interconnected, it has the potential to become more useful to us.

There are many technologies to store connected data and information, query connected data and even reason about connected data. However, most of these technologies are very sophisticated and provide many features beyond the simple connection of data, for instance querying, reasoning and applying various graph algorithms on connected data. These features, though doubtlessly valuable and crucial for many tasks, often come with the price of increased complexity as well as reduced portability and generalizability of libraries, databases and services. Often, these properties make it difficult to connect data residing on different platforms, servers and applications in an easy and coherent way.

Thus, the motivation to implement onedb was to design a service focusing exclusively on one key feature: to connect data across applications and systems in the simplest way possible.

onedb Design Goals

While the key requirement for the onedb service is to enable linking data across applications and platforms, a number of secondary design goals have been chosen to differentiate onedb from other solutions. These design goals are: (1) developer productivity, (2) generalizability and simplicity, (3) portability, and (4) testability.

Developer Productivity

There is a great difference between a technology, which allows doing something, and a technology, which enables doing something: most technologies can be bended in some form or another to find a solution for a problem (allowing), but few help to find a truly elegant and effective solution (enabling).

Since the feature at the heart of the onedb solution is apparently simple (connect two pieces of data), a lot of the development effort for onedb was spent on effectively supporting this core feature in an enabling fashion. In general, onedb aims to achieve this by minimizing the steps required to get from idea to connected data. This is supported by three intertwined design features: (1) minimal configuration, (2) fluent and readable API and (3) powerful conventions. I will give an example in the following to illustrate these three design features.

The following are the minimal but sufficient steps to connect two pieces of data in the onedb cloud:

  1. Get a onedb API key
  2. Download the onedb Java client library
  3. Link the onedb client library to an existing or new Java project
  4. Add the following statement anywhere in your Java app:
OneJre.init("[your API key here]");

One.createRealm("foo").and(new When.RealmCreated() {

    @Override
    public void thenDo(WithRealmCreatedResult r) {
        One.append("bar").to(r.root()).in(r.client());

        System.out.println("Created " + r.root() + ":" + r.secret());
    }
});

[full source on github]

The code listed above will create two nodes in the onedb cloud. The first node “foo” will be a realm: a kind of mini-database, which can be accessed using the generated secret token and the address of the node foo. This ‘realm’ node will be connected to another node “bar”. Finally, the application will print out the address of the node “foo” along with the access secret necessary to access the node.

Both nodes will be identified with unique resolvable identifiers such as:

foo: https://u1.linnk.it/1owvl3/foo
bar: https://u1.linnk.it/1owvl3/foo/bar1

It is also possible to access the nodes using a simple REST interface (given the access secret is supplied using HTTP BASIC authentication). Various representations such as the following are supported through URLs such the following:


https://u1.linnk.it/1owvl3/foo.node.xml


https://u1.linnk.it/1owvl3/foo.value.json


https://u1.linnk.it/1owvl3/foo/bar1.value.html

The example above describes all the configuration necessary to start storing data using onedb. There is no need to set up a server, to configure databases, buckets or private keys, no definition of tables or keys and no JDBC connection pool. You can further see the fluent API of the onedb client resembling English sentences (“create realm ‘foo’ and when realm created then do …”). But, most importantly, the expressiveness of the code snippet is amplified by a number of powerful conventions embedded in onedb, such as the automatic designation of global identities for all created nodes or the various data representations in formats like JSON or XML, which are available for every node in the onedb cloud.

Generalizability and Simplicity

While C and JavaScript as languages both have their well-discussed shortcomings, they are without doubt extremely popular and widely used. One key ingredient to their success is their generalizability. You can literally implement any non-UI logic for any platform in C, while you can implement UIs for most rich client platforms with JavaScript.

There are two ways to achieve generalizability: through simplicity or sophistication. Java, for instance, achieves generalizability through sophistication; by providing many advanced and sometimes complex features (e.g. Threads, NIO, ..), which allow deploying the language in a large number of use cases. JavaScript, in contrast, achieves generalizability through simplicity; for one, it’s much easier to implement a basic JavaScript interpreter than a JVM+JDK. Moreover, there are only a handful of (useful) language features, which makes the language widely supported and known.

onedb strives for generalizability through simplicity in a number of ways:

  • A very basic data model is used to represent connections: an unlabeled, directed graph.
  • onedb’s core engine supports only the most basic operations on such graphs: append a node, remove a node, and replace a node.
  • Apart from supporting these operations, onedb’s core engine supports to synchronize nodes and their connections between multiple locations but nothing else.

This simplicity at the heart of the onedb engine enables the database to support a whole range of more sophisticated data structures, for instance trees, maps, and even labeled graphs.

onedb further strives to be generalizable in that it minimizes the assumptions about your data. For instance, how the data is organized, how it is queried or which data types are used. As an example for the support of a wide diversity of data types, see the example snippet below. All listed operations will work in onedb without need for custom configuration:

public static class Person implements Serializable {
};

...

String bar = "bar";
Integer meaning = 42;
Person p = new Person();

One.append(bar).to(realmRoot).in(client);
One.append(meaning).to(bar).in(client);
One.append(p).to(bar).in(client);

[full source on github]

Portability

Data in any application of non-trivial size ceases to exist and is reborn in sheer endless incarnations. For instance, take a user’s last name. Initially the last name will be held by a text field as part of the web browser’s DOM. It might begin its journey as element in a JSON data structure being sent to the application server as part of a HTTP message. Next, the last name will be deserialized on the server and live, temporarily, as part of a Java or C# object. Then, after being included in an SQL statement the last name might find its final, persisted resting place … until it is requested by another browser session.

System with Distributed Databases

The problem here is less that identical data is replicated numerous times (this is unavoidable if multiple devices are involved) but more that the data is represented in various different and incompatible formats for the various involved platforms (Java object, JSON, DOM property, …).

onedb strives to provide one common platform across devices and environments, which enables to work with and connect heterogeneous pieces of data. The way onedb achieves this is by offering portable and embeddable client libraries for various platforms, which can access one integrated data space: the onedb cloud.

Distributed Application Using onedb

The onedb client engine is written in vanilla Java with no external dependencies apart from the core JDK classes (java.*). Moreover, the core engine can be compiled using Google’s awesome Google Web Toolkit (GWT) Java-to-JavaScript compiler. The onedb cloud can therefore be accessed on all Java compatible environments and most modern web browsers. Please bear with me for the web browser part, though. I don’t believe it is really increasing developer productivity to require the continuous compilations of a 50,000+ LOC client library with the not exactly lightning fast GWT compiler. I therefore plan to provide a precompiled JavaScript client library rather than a GWT library (but I am still working on the API for said library).

Testability

Application logic which is tightly coupled to persisted data is notoriously difficult to test using automated unit tests. There are many reasons for this but one key factor is that it is often non-trivial to start up a database with the right configuration and test data for a particular test case.

onedb strives to make code, which relies on data that will be persisted in the production system, both easy and fast to test. For this purpose, an almost fully functional in-memory onedb cloud can be started up for test cases. Starting up the test cloud should take less than 200ms and can therefore be done, if required, for each individual unit test.

I have given an example above for a simple application, which connects two nodes in the onedb cloud. We can test this application locally in far superior speed (since there are no Internet messages being sent) by simply changing the first line of the application code to OneTestJre.init();:

OneTestJre.init(); // was: OneJre.init("[your API key]");

One.createRealm("foo").and(new When.RealmCreated() {

    @Override
    public void thenDo(WithRealmCreatedResult r) {
        One.append("bar").to(r.root()).in(r.client());

        System.out.println("Created " + r.root() + ":" + r.secret());
    }
    
});

[full source on github]

Support for Small Data

I have mentioned in the introduction that onedb is a platform to support connected small data. To discuss onedb’s ability to support small data, I will use the following definition from another article:

Small Data is interconnectable and portable data which can be stored, managed, and processed independently on all components of a distributed system.

The essential idea this definition is based upon is that ‘big’ data can be made ‘small’ if three requirements are satisfied: (1) the components of a distributed application are enabled to manage their data independently from other application components, (2) data is portable in that it can be seamlessly moved from one component to the other and (3) data managed by one component can be ‘connected’ to data being managed by other components. I will briefly describe how onedb fulfills these requirements in the following.

To allow application components to manage data independently from other components (1st requirement of small data), the onedb cloud is divided into a large number of sections in different granularity (e.g. one last name or all user data). These sections, called realms, are self-contained and allow application components to manage a set of nodes and their connections. It is very easy to create new realms, so application components are enabled to create and manage their own, independent realms if required.

onedb cloud divided into realms

To allow for data portability (2nd requirement of small data), onedb provides two intertwined features: First, realms or parts of a realm can be shared between various components of an application. Second, data from the onedb cloud is made available locally to system components through means of an automated synchronization process (think fine-grained Dropbox for applications). This synchronization process is available on all platforms to which the onedb client engine can be deployed.

onedb Sharing and Synchronization

Allowing data managed by different components to be connected (3rd requirement of small data) is the core feature of onedb as mentioned above. Any piece of data in the onedb cloud (or node) can be connected to any other piece of data in the onedb cloud.

Connections between Realms

Why bother?

onedb is a very young technology and as such I expect there to be bugs, downtimes, and, of course, there is the old friend of any new framework: quite sparse documentation.

However, I do believe that there are many exciting use cases for onedb. You can plug it into your own apps in a matter of minutes and use it to store logs, settings, test data, or test parameters. You can also use onedb as a quick way to publish and update a set of web pages or other REST resources, which may be consumed by any REST capable client.

There are currently two ways, in which you can use onedb. Firstly, you can go to the onedb webpage and grab an API key for the technology preview server. The technology preview server allows you to store up to 10,000 nodes/objects in the onedb cloud per API key. Secondly, you can contact me, if you would like to install your own onedb server node, and I will be happy to assist you with installation and configuration procedure.

Limitations & Last thoughts

onedb is a service focused on one particular task: to help you connect and integrate small data across applications and platforms. onedb is built as a lightweight add-on to existing applications and infrastructures and not as an replacement for these. onedb, in consequence, does not provide many features commonly found in other databases or cloud platforms. However, since onedb is lightweight and generic, it is very easy to integrate it with other technologies, for instance building an index with Lucene or running a Hadoop job to process data stored in onedb.

Generic software problems, such as the one addressed by onedb, can often be solved by an array of related technologies. I have created a preliminary list of interesting related technologies. Please let me know of any technologies I missed there and I will be happy to include them.

I hope you enjoyed reading this article introducing onedb. If you would like to learn more about onedb, there is a detailed and hands-on guide on how to use the various features: “onedb Tutorial: Getting started and First Steps“.

onedb is a very important part of my PhD thesis and therefore you could help me a great deal, if you could send me some quick feedback (@mxro or feedback@onedb.de). Please also let me know if you find any bugs and I will try to fix them as quickly as possible.

Looking forward to hear from you and please share :)

Innovating on a Dime: Design Science for Small Teams

Disruptive technological innovations often originate from surprisingly small beginnings. Giants of the Internet age such as the omnipresent Google search engine and the Facebook social networking platform have initially been designed and developed by small and independent teams. These examples are attractive to design science researchers in information systems, who desire to deliver new and innovative artifacts.

Motivated by these observations, we have written a paper proposing a possible specialization of design science research.

PDF (AISNet.org)

Slides:

Citation:

Rohde, M. E., & Sundaram, D. (2011). Innovating on a dime: Design science for small teams. In European Conference on Information Systems .

URL http://aisel.aisnet.org/ecis2011/224/

Work Pattern: Explore, Tag, Collect (ETC)

A work pattern for explorative investigations, which should result in a structured collection, for instance a Literature Review.

Step 1: Explore

Browse through potential sources for your collection. Possible tools for this are databases, search engines or feed aggregators (eg Google Reader).

Step 2: Tag

Tag articles or places you think might be of relevance. Do not think of a complex and sophisticated typology for tags but rather use a generic „might be interesting tag“. In Google Reader, for instance, this can be achieved by ‚starring‘ relevant articles. In other contexts, you could email yourself a copy of the respective article. For academic research you can try to collect the full references if it can be achieved effortlessly (for instance from databases interfacing with online bookmark managers like cituelike.org).

The main objective for the steps 1 and 2 is speed and openness. Try to explore as many sources as possible an be open to novel roads.

Step 3: Collect

With a little time difference, go back to the tagged resources. Flip through them and reflect whether you still find them relevant. If you think so, you can add these articles to a collector. A collector is a vehicle which helps you to organize and classify the pieces of information you have found. This can for instance be a long essay with many essays, where you can add the pieces of information as references to relevant sections. It can also be a classification typology of tags or other forms to structure your information.