Designing Micro Services the Right Way

For a few years now, micro services have been all the rage when it comes to the architecture of large applications. Personally I have always been a bit puzzled about what was so new a great about micro services in comparison to what came before them: Service Oriented Architecture (SOA). Indeed, SOA itself is often portrayed as a frightful antipattern from our past to be mentioned in the same breath as CORBA.

To me, the move from CORBA et all to SOA to Micro Services has not been one of disrupting innovation but one of continuous learning; chiefly in relation to the technologies we employ. It just makes a world of difference setting up a big old monolithic application from the past or an express server in Node.JS (which is also a ‘monolith’ in its own right but just a smaller one – hopefully).

The core problem we are trying to solve has not changed: distributed computing. Unfortunately, one of the first things we learnt about distributed computing seems to have been given less attention recently: that it is best avoided wherever possible. Why? Because it introduces great complexity into an application and can result in many development and operational problems (see YouTube: 10 Tips for failing badly at Microservices by David Schmitz).

One of the most problematic areas is data or persisted state. If the same piece of data needs to be used by multiple services, things become very complicated since it is often required to keep data in sync between multiple places (see YouTube: Managing Data in Microservices by Randy Shoup).

Recently I came across a presentation which I think outlined a very nice approach for dealing with micro services – one that relied heavily on code generation, enforcing common standards and automated testing. Furthermore in the presented architecture one language was used primarily, which I think is a very good approach. I highly recommend viewing this presentation for anyone interested in a way to deal with the complexity of micro services:

YouTube: Design Microservice Architectures the Right Way by Michael Bryzek

What I personally took away from this:

  • Focus on testability. Allow for fast unit and integration tests and even testing with production data. Only code that is easy to test and heavily tested allows for fast and bold development. This organisation for instance automatically updates all their dependencies once per week – automatically, since they have full confidence that their tests will pick up any issues.
  • Utilise code generation. The sad truth of micro services is that we will have to duplicate things, such as commonly used entities – especially if multiple programming languages are involved. Code generation provides an elegant way to deal with this unfortunate situation.
  • Enforce common standards. Although micro services are intended to reduce complexity by dividing up a complex system into small manageable chunks, they can actually result in increased overall complexity, especially if many different technologies are employed. In that case, enforcing strict common standards can help in keeping things simple for developers and ops.
  • Embrace events. Triggering services into action by using events rather than direct API calls can help in making a distributed system more predictable and easier to debug.

I think this presentation provides an excellent overview of best practices for micro services and I couldn’t think of anything to criticise or add. I think it represents the best way of building micro services I am aware of as of now.

I do think, however, there is one important additional issue to consider, and that is that a micro service built according to the best principles and standards will still be a liability if it wasn’t necessary to build a micro service to begin with. This is not so much the question if we should micro services or not (in any organisation of a certain size they are an imperative) but how many.

One of the key drivers of success for micro services within a larger system is to get the boundaries of the services right (see bounded context) and I think we should aim to make micro services as large as possible so we have as few of them as possible; taking into consideration the restrictions of team size, data and complexity:

  • Team: It might sound like heresy but I do think that one ‘physical’ micro service could be maintained by up to three to five teams (and not just one team per micro service). That of course would be the upper maximum, there is nothing wrong with having just one team per micro service. It really depends on what service you are building.
  • Data: And some more heresy: I think that for data it is often better to scale up rather than scale out. Why? Data is all about state and being able to keep state within the physical confines of one systems leads to much improved performance and reduced complexity. Thus we should think about the database management system we will be using for our service and what is the maximum we can scale it up to. Then take 20% of that and ask yourself if your data will stay within that limit. If not, it might be prudent to break the micro service apart or maybe change the DBMS.
  • Complexity: The main drivers of complexity in software are code size, inter-dependencies and heterogeneity. If our micro service would contain large amounts of code with many intricate inter-dependencies that tackles many different problems in different ways, it may be advisable to think about breaking the service up.

As mentioned, distributed systems are inherently more complex that non-distributed ones. Therefore, if we have larger micro services, our system becomes less distributed overall and we hopefully have less accidental complexity to deal with.

Thus, to sum things up, we must be aware of the dangers of micro services and deploy tooling strategically as outlined in the presentation as well as be mindful of how we can build our system in a way that we avoid the complexities of distributed systems as much as possible.

Move git repository

Sometimes it is necessary to move the location of a git repository; be it from one GitHub repo to another or moving a repo from GitHub to Bitbucket. This can be surprisingly tricky since one needs to make sure to include all branches, tags, etc. when copying the data.

Thankfully git magic allows doing this fairly easily. Just run the following commands:

git clone --mirror <old-repo-url>
cd <repo-name>
git remote add new-origin <new-repo-url>
git push new-origin --mirror

That should be it!

Note that if you are copying a GitHub repo you might get lovely messages such as the following. That should be fine and nothing to worry about.

 ! [remote rejected] refs/pull/1/head -> refs/pull/1/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/10/head -> refs/pull/10/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/100/head -> refs/pull/100/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/101/head -> refs/pull/101/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/102/head -> refs/pull/102/head (deny updating a hidden ref)

 

Free Tool for Renaming Files with Current Date

For many files, it is good to keep track of when they were originally created. In theory, every file created on any mainstream operating system should have its ‘creation date’ and ‘last date modified’ recorded as part of its metadata. This mechanisms, however, is often unreliable, especially when using file synchronisation tools such as Dropbox or Google Drive.

Thus it is often handy to prefix file names with the date they were created, such as:

2018 09 30 Letter.pdf

There are some nice tools available for this purpose, for instance Bulk Rename Utility. However, I often find these a bit too complex for what I need.

I have thus developed a little tool – Date Namer – , which does just the one thing I require: To prefix file names with the current date.

date-namer.PNG

Be welcome to download this tool from here:

Upcoming release will be published on this page: Releases. Further, all the source code for this tool is available on GitHub.

For those interested in the implementation details: For this project, I tried using Electron. This allows developing a Desktop application using Node.js. I found this overall quite easy to use. Internally this application will run an instance of Chromium to render the application. The running application takes thus around 50 MB of RAM. I think this is not too bad for this use case. The app performance is very good.

Tech Tip: Make Spotlight Searches Faster on Mac OS X

One of the things I really like about Windows 10, is the ability to hit the Windows key and type the first few letters of the application name to find and open this application. Mac OS X in theory provides the same feature by hitting the Command Key + Space. This opens a spotlight search.

Unfortunately I found this search to be inferior to the one found in Windows since it works slower – even on my very powerful Mac machine, it often takes more than a two to three seconds to ‘find’ the application I try to open.

Last week, I found a way to somewhat mitigate this. Just head to the settings and in there to Spotlight. Disable all the categories apart from ‘applications’.

spotlight.png

While this does make the search faster, also note that it won’t search for the other types of content anymore.

Upgrade to Oracle JDK 10 on CentOS/RHEL

With the release of Java 10 only a few days ago, it seems only prudent to update to Java 10 on suitable systems since the support for Java 9 official ends with the release of Java 10. (Note that Java 8 still enjoys long-time support, so it might be the best choice to stick with that on systems which are difficult to change)

  • Go to the official download site and indicate you agree to their terms.
  • Copy the link for jdk-10_linux-x64_bin.rpm
  • Log into your CentOS machine
  • Download the RPM file using the following command (Don’t forget to provide the link you have copied)

wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" [paste copied link here]

  • Install the JDK

sudo yum localinstall jdk-*_linux-x64_bin.rpm

  • Set the default Java version to 10 using alternatives

sudo alternatives --config java

  • Lastly, make sure you are running the correct version of Java:

java -version

 

Upload Elastic Beanstalk Application using Maven

AWS Elastic Beanstalk is well established service of the AWS cloud and can be used as a powerful platform to deploy applications in various languages. In this short tutorial, I will outline how to conveniently deploy a Tomcat application to AWS Elastic Beanstalk using the beanstalk-maven-plugin.

The following assumes that you already have a project which is configured to be deployed as WAR and provides a valid web.xml to start answering requests. If you are unsure of how to set this up, please have a look at the example project web-api-example-v2 (on GitHub).

Step 1: Create IAM User

  • Create a new user on IAM user AWS for programmatic access

iam

  • For Permissions, select ‘Attach existing policies directly’ and add the following policy

elastic

  • Save the access key and secret key

Step 2: Add Server to Local Maven Configuration

  • Add the following declaration in the element in your $HOME/.m2/settings.xml and provide the access key and secret key for the the user you’ve just created

<server>
  <id>aws.amazon.com</id>
  <username>[aws access key]</username>
  <password>[aws secret key]</password>
</server>

Step 3: Add Beanstalk Maven Plugin


<plugin>
  <groupId>br.com.ingenieux</groupId>
  <artifactId>beanstalk-maven-plugin</artifactId>
  <version>1.5.0</version>
</plugin>

  • Test your security credentials and connection to AWS

mvn beanstalk:check-availability -Dbeanstalk.cnamePrefix=test-war

Step 4: Create S3 Bucket for Application

  • Create a new S3 bucket with a name of your choice (e.g. the name of your application)

bucket

Step 5: Update Plugin Configuration

  • Provide the following configuration for the beanstalk-maven-plugin
<plugin>
  <groupId>br.com.ingenieux</groupId>
  <artifactId>beanstalk-maven-plugin</artifactId>  
  <version>1.5.0</version>
  <configuration>
    <applicationName>[Provide your application name]</applicationName>
    <!-- Path of the deployed application: cnamePrefix.us-east-1.elasticbeanstalk.com -->
    <cnamePrefix>${project.artifactId}</cnamePrefix>
    <environmentName>devenv</environmentName>
    <environmentRef>devenv</environmentRef>
    <solutionStack>64bit Amazon Linux 2015.03 v1.4.5 running Tomcat 8 Java 8</solutionStack>

    <!-- Bucket name here equal to artifactId - but this is not guaranteed      to be available, so therefore the bucket name is given statically -->
    <s3Bucket>[Provide your S3 bucket name]</s3Bucket>
    <s3Key>${project.artifactId}/${project.build.finalName}-${maven.build.timestamp}.war</s3Key>
    <versionLabel>${project.version}</versionLabel>
  </configuration>
</plugin>

Step 6: Deploy project

  • Run the following to upload the project to the S3 bucket:
mvn beanstalk:upload-source-bundle
  • If this succeeds, deploy the application

mvn beanstalk:upload-source-bundle beanstalk:create-application-version beanstalk:create-environment

Your application should now be deployed to Elastic Beanstalk. It will be available under


cname.us-east-1.elasticbeanstalk.com

Where cname is the cname you have specified in step 5

Good To Know

  • To find out, which solution stacks are available (to define the solutionStack environment variable), simply run

mvn beanstalk:list-stacks

References

PlantUML (Open Source Awesomeness)

I’ve always had a soft spot for diagrams. I think that representing information in various visual ways tremendously helps our thinking and understanding. Unfortunately it is often a big headache to create (and maintain) diagrams.

So I was very pleased today when I came across PlantUML. PlantUML is a Java library and web service which renders UML diagrams from text input. Take the following text definition for example:


@startuml
object Object01
object Object02
object Object03
object Object04
object Object05
object Object06
object Object07
object Object08

Object01 <|-- Object02
Object03 *-- Object04
Object05 o-- "4" Object06
Object07 .. Object08 : some labels
@enduml

This will be rendered into the following diagram:

diagram

PlantUML does not just support object diagrams but also many other types of diagrams. There is another service, called WebSequenceDiagrams which focusses on only sequence diagrams (and is not open source) but can be useful if more visually pleasing sequence diagrams are required,

Configuring an initd Service for node_exporter

I recently wrote an article showing how to configure Prometheus and Grafana for easy metrics collection. In that article, I assumed that the system which should be monitored would use the systemd approach for defining services.

I now had to set up the node_exporter utility on a system which uses the initd approach. Thus, I provide some simple instructions here on how to accomplish that.


wget https://github.com/prometheus/node_exporter/releases/download/v0.15.2/node_exporter-0.15.2.linux-amd64.tar.gz

  • Extract the archive

tar xvfz node_exporter-*.tar.gz

  • Create a link

ln -s node_exporter-* node_exporter

  • Create the file /opt/node_exporter/node_exporter.sh and add the following content:

#!/bin/sh

/opt/node_exporter/node_exporter --no-collector.diskstats


#!/bin/sh
### BEGIN INIT INFO
# Provides: node_exporter
# Required-Start: $local_fs $network $named $time $syslog
# Required-Stop: $local_fs $network $named $time $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Description:
### END INIT INFO

SCRIPT=/opt/node_exporter/node_exporter.sh
RUNAS=root

PIDFILE=/var/run/node_exporter.pid
LOGFILE=/var/log/node_exporter.log

start() {
if [ -f "$PIDFILE" ] && kill -0 $(cat "$PIDFILE"); then
echo 'Service already running' >&2
return 1
fi
echo 'Starting service…' >&2
local CMD="$SCRIPT &> \"$LOGFILE\" && echo \$! > $PIDFILE"
su -c "$CMD" $RUNAS > "$LOGFILE"
echo 'Service started' >&2
}

stop() {
if [ ! -f "$PIDFILE" ] || ! kill -0 $(cat "$PIDFILE"); then
echo 'Service not running' >&2
return 1
fi
echo 'Stopping service' >&2
kill -15 $(cat "$PIDFILE") && rm -f "$PIDFILE"
echo 'Service stopped' >&2
}

uninstall() {
echo -n "Are you really sure you want to uninstall this service? That cannot be undone. [yes|No] "
local SURE
read SURE
if [ "$SURE" = "yes" ]; then
stop
rm -f "$PIDFILE"
echo "Notice: log file is not be removed: '$LOGFILE'" >&2
update-rc.d -f  remove
rm -fv "$0"
fi
}

case "$1" in
start)
start
;;
stop)
stop
;;
uninstall)
uninstall
;;
retart)
stop
start
;;
*)
echo "Usage: $0 {start|stop|restart|uninstall}"
esac

Note 1: This sample script runs the script as user root. For production environments, it is highly recommended to configure another user (such as ‘prometheus’) which runs the script.

Note 2: Also check out this init.d script made specifically for node_exporter: node.exporter.default by eloo.

  • Make both files executable

chmod +x /etc/init.d/node_exporter

chmod +x <em>/opt/node_exporter/node_exporter.sh</em>

  • Test the script

/etc/init.d/node_exporter start

/etc/init.d/node_exporter stop

  • Enable start with chkconfig

chkconfig --add node_exporter

All done! Now you can configure your Prometheus server to grab the metrics from the node_exporter instance.

Easy VPS Backup

I love VPS providers such as RamNode or ServerCheap which provide excellent performance at a low price point. Unfortunately, when going with most VPS providers, there are no easy built-in facilities for backing up and restoring the data of your servers (such as with AWS EC2 snapshots). Thankfully, there is some powerful, easy to use and open source software available to take care of the backups for us!

In this article, I am going to show how to easily do a backup of your VPS using restic. Another tool you might want to look at is Duplicity, which provides a higher level of security but which is also more difficult to use. (And there are a many, many other alternatives available as well.)

You will need to have access to two servers to follow the following. One server which should be backed up (in the following referred to as Backup Client) and one server which will host your backups (in the following referred to as Backup Server).

Installing Restic (on Backup Client)

  • Get the URL to the binary for you system from the latest restic release.
  • Log into the Backup Client
  • Download the binary using wget

wget https://github.com/restic/restic/releases/download/v0.8.1/restic_0.8.1_linux_amd64.bz2

  • Unzip the binary

bzip2 -dk restic_0.8.1_linux_amd64.bz2

  • Move restic to /opt

sudo mv restic_0.8.1_linux_amd64 /opt/restic

  • Make restic executable

chmod +x /opt/restic

Establishing SSH Connection

  • On the Backup Client generate an SSH private and public key (Confirm location `/root/.ssh/id_rsa` and provide no passphrase)
sudo su - root
ssh-keygen -t rsa -b 4096
  • Get the public key

cat /root/.ssh/id_rsa.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDG3en ...

  • On the Backup Server, create a new user called backup
  • Copy the public key from the Backup Client to the Backup Server so that Backup Client is authorised to access it via SSH. Just copy the output from above and paste it at the end of the authorized_keys file

sudo vi /home/backup/.ssh/authorized_keys

  • On the Backup Client, test the connection to the Backup Server.

sudo ssh backup@...

Perform Backup (on Backup Client)


/opt/restic -r sftp:backup@[backup-server]:/home/backup/[backup client host name] init

  • Backup the full hard disk (this may take a while!)

/opt/restic --exclude={/dev,/media,/mnt,/proc,/run,/sys,/tmp,/var/tmp} -r sftp:backup@[backup-server]:/home/backup/[backup client host name] backup /

 

Schedule Regular Backups (Backup Client)

  • On the Backup Client, create the file /root/restic_password. Paste your password into this file.
  • Create the script file /root/restic.sh (replace with the details of your servers)

#/bin/bash

/opt/restic -r sftp:backup@[backup-server]:/home/backup/[backup client host name] --password-file=/root/restic_password --exclude={/dev,/media,/mnt,/proc,/run,/sys,/tmp,/var/tmp} backup /
/opt/restic -r sftp:backup@[backup-server]:/home/backup/[backup client host name] --password-file=/root/restic_password forget --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 75
/opt/restic -r sftp:backup@[backup-server]:/home/backup/[backup client host name] --password-file=/root/restic_password prune
/opt/restic -r sftp:backup@[backup-server]:/home/backup/[backup client host name] --password-file=/root/restic_password check

  • Make script executable

chmod +x /root/restic.sh

  • Trail run this script: /root/restic.sh
  • If everything worked fine, schedule to run this script daily (e.g. with sudo crontab -e) or at whichever schedule you prefer (Note that the script might take 10 min or more to execute, so it is probably not advisable to run this very frequently. If you need more frequent updates, just run the first line of the script ‘backup’ which is faster than the following maintenance operations).

0 22 * * * /root/restic.sh

 

That’s it! All important files from your server will now be backed up regularly.

Java Logging – The Ultimate, Easy Guide

On first glance, logging looks like an exceedingly simple problem to solve. However, it is one of these problems which unfortunately become more and more complex the longer one looks at it.

I think because of this, there are many frameworks in Java to support logging (since everyone seems to have thought they have found a solution) with many of them being less than optimal, especially under load.

In effect, for someone who wants to start with logging in Java, there is an overwhelming, confusing and often contradictory wealth of resources available. In this guide, I will provide an introduction to Java logging in three simple steps: First, to choose the right framework. Second, to get your first log printed out onto the screen. And, third, to explore more advanced logging topics. So, without further ado, here the steps to get you started with Java logging:

Framework

The first question to sort out when considering logging for Java is to decide which logging framework to use. Unfortunately, there are quite a few to choose from.

The standard Java logging seems to be very unpopular. Further, it seems that Log4j and Logback both have architectural disadvantages to Log4j 2. In specific in respect to the performance impact which logging has on the host app. Loggly ran some tests on the different logging frameworks and the theoretical advantages of Log4j 2 also seem to be reflected in cold, hard data.

Thus, I think the prudent choice is to go with log4j2 in any but exceptional circumstances.

How To Get Started

The official documentation for Log4j 2 is not very approachable. Simply speaking, you only need to do two things to get ready for logging with Log4j 2.

The first is to add the following Maven dependency:

<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.10.0</version>
</dependency>

The second is to create the file src/main/resources/log4j2.properties in your project with the following content:

status = error
name = PropertiesConfig

filters = threshold

filter.threshold.type = ThresholdFilter
filter.threshold.level = debug

appenders = console

appender.console.type = Console
appender.console.name = STDOUT
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

rootLogger.level = debug
rootLogger.appenderRefs = stdout
rootLogger.appenderRef.stdout.ref = STDOUT

(Note, you may also provide the configuration in XML format. In that case, simply create file named log4j2.xml in src/main/resources)

Now you are ready to start logging!

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

public class OutputLog { 
  public static void main(String[] args) { 
    Logger logger = LogManager.getLogger(); 
    logger.error("Hi!"); 
  } 
}

Master Class

The real power of using a logging framework is realised by modifying the properties file created earlier.

You can, for instance, configure it to log into a file and rotate this log file automatically (so it doesn’t just keep on growing and growing). The following presents a properties file to enable this:


status = error
name = PropertiesConfig

property.filename = ./logs/log.txt

filters = threshold

filter.threshold.type = ThresholdFilter
filter.threshold.level = debug

appenders = rolling

appender.rolling.type = RollingFile
appender.rolling.name = RollingFile
appender.rolling.fileName = ${filename}
appender.rolling.filePattern = ./logs/log-backup-%d{MM-dd-yy-HH-mm-ss}-%i.log.gz
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true
appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.rolling.policies.size.size=10MB
appender.rolling.strategy.type = DefaultRolloverStrategy
appender.rolling.strategy.max = 20

loggers = rolling

logger.rolling.name = file
logger.rolling.level = debug
logger.rolling.additivity = false
logger.rolling.appenderRef.rolling.ref = RollingFile

#rootLogger.level = debug
#rootLogger.appenderRefs = stdout
rootLogger.appenderRef.stdout.ref = RollingFile

This configuration will result in a log file being written into the logs/ folder. If the application is run multiple times, previous log files will be packed into gzipped files:

output

For even more sophisticated logging, you would want to set up a Graylog server and then send the logs there. This can be achieved using the logstash-gelf library. Add the following Maven dependency:

<dependency>
<groupId>biz.paluch.logging</groupId>
<artifactId>logstash-gelf</artifactId>
<version>1.11.1</version>
</dependency>

And then provide a log4j.xml configuration file like the following (replace yourserver.com with your Graylog server):


<Configuration>
<Appenders>
<Gelf name="gelf" host="udp:yourserver.com" port="51401" version="1.1" extractStackTrace="true"
filterStackTrace="true" mdcProfiling="true" includeFullMdc="true" maximumMessageSize="8192"
ignoreExceptions="true">
<Field name="timestamp" pattern="%d{dd MMM yyyy HH:mm:ss,SSS}" />
<Field name="level" pattern="%level" />
<Field name="simpleClassName" pattern="%C{1}" />
<Field name="className" pattern="%C" />
<Field name="server" pattern="%host" />
<Field name="server.fqdn" pattern="%host{fqdn}" />

<DynamicMdcFields regex="mdc.*" />
<DynamicMdcFields regex="(mdc|MDC)fields" />
</Gelf>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="gelf" />
</Root>
</Loggers>
</Configuration>

Then create a new GELF UDP input in Graylog (& don’t forget to open the firewall for udp port 51401) and you are ready to receive messages!

message

Finally, I personally find the logging frameworks with all their dependencies and insistence on configuration files exactly where they expected them a bit intrusive. Thus, I developed delight-simple-log – this very simply project can be used as a dependency in your reusable component; and then linked with Log4j 2 in the main package for an app. That way, the Log4j dependencies will only be present in one of your modules.