Bridging C++ to Scala with BridJ

At Curalate we’ve moved towards a microservice architecture with each service living in its own git repository. For the most part, we’ve standardized the way we build our Scala projects using Apache Maven to manage dependencies and compilation. This is convenient since any Curalady / Curalad can clone one of our repos and type mvn install at the root with the expectation that everything will compile successfully on the first try. We wanted this same ease of use for our Scala projects that needed access to native libraries and this post explains how we obtained it.

Seamlessly Interfacing Scala with C++

The JVM is an impressive piece of technology and enables awesome high-level languages like Scala. However, there are times that we need to use native languages like C++, especially when applying computer vision and machine learning. Like any good startup, we racked up technical debt to move quickly. Initially, our native projects were compiled manually and interfaced with Java via JNA. This required an error-prone multi-step process when making changes, including manually placing a dynamic library in a JAR for deployment. As native development became more important and the size of our team increased this manual process became cumbersome.

It was clear to us that we needed to overhaul our native development infrastructure. When we approached the task of redesigning our native build system we had several goals in mind:

Standardizing native builds and providing push button operation (i.e. mvn install is all we need)
Adding native functionality to a Scala project should be as simple as putting native source files in the right directories
Minimizing boilerplate and saving developer time
Including shared libraries in the final JAR should be automatic

Choosing the Interface

There are several options for using Java and C++ together which in turn, allows us to interface with Scala. The classic option is the Java Native Interface (JNI) which is part of the Java language specification. If you’ve ever used the JNI you may recall that there is quite a bit of boilerplate. In addition, almost all communication between the native code and Java must be done through special native JVM calls requiring a significant amount of glue code to do seemingly simple things.

A higher level alternative to JNI is Java Native Access (JNA) which when paired with JNAerator can minimize the boilerplate we need to write. JNAerator takes in a C/C++ header file and generates a Java source file with wrappers for each native function. This makes JNA appealing since we only need the header file, which we had to write anyway! The price for these high-level features is that JNA is significantly slower than the JNI. Often the sole reason for crossing the native boundary is speed so this is problematic.

Fortunately, JNAerator recently added support for yet another way to interface native code with Java, BridJ. BridJ is a relatively young project, but it claims to have speeds comparable to the JNI and it allows direct interfacing with C++. In contrast, the JNI and JNA are designed to interface with C which requires redundant extern declarations to use C++. BridJ also allows building shared libraries for multiple target operating systems and architectures. As long as the libraries are placed in a specific directory they will be included in the final JAR and at runtime BridJ extracts the library from the JAR and instructs the class loader to load the library.

Automating the Build

To integrate all of this into our existing development infrastructure we wrote a specialized Makefile along with a suite of scripts. We wrote hooks for specific Maven lifecycle phases to make everything seamless. Simply placing C++ header and source files in the right sub-directories is enough to get a working hybrid Scala / C++ project. Our build system takes care of calling JNAerator to generate Java wrappers, building shared libraries, and putting everything in the correct place in the final JAR for deployment.

Using the Interface

Now we’ll work through the obligatory hello world example here to show what BridJ looks like in practice.

First, we’ll write our C++ header to define the interface with Java. It’s best to stick to primitive types here like char*, int, etc. since JNAerator’s support for parsing header’s is limited. To transfer arbitrary data or objects we found it was easier to serialize everything to a byte array and unpack that on the native side (Java’s ByteBuffer is handy here). This method of passing serialized data was better captured with two headers instead of just one. For the first header, we would restrict ourselves to primitive types to define the Java interface that will perform the serialization and call the appropriate native function. The second header file would be written to accept the unpacked data as more complex native types like objects and to supply the actual native implementation. This separation of concerns made things a little cleaner to implement.

Here’s our C++ header that specifies the Java interface:

#define HELLO_WORLD_HPP
#ifndef HELLO_WORLD_HPP

/**
 * Prints the given string to stdout.
 * @param str  the characters making up the string
 * @param len  the length of the string
 */
 void helloWorld(const int len, const char* str);

#endif

and here’s the C++ implementation file:

#include "hello-world.hpp"
#include <iostream>

void helloWorld(const int len, const char* str) {
  std::cout << std::string(str, len) << std::endl;
}

Here’s the file automatically generated from our header by JNAerator:

package com.curalate.helloworld
import org.bridj.BridJ;
import org.bridj.CRuntime;
import org.bridj.Pointer;
import org.bridj.ann.Library;
import org.bridj.ann.Ptr;
import org.bridj.ann.Runtime;
/**
 * Wrapper for library <b>hello-world</b><br>
 * This file was autogenerated by <a href="http://jnaerator.googlecode.com/">JNAerator</a>,<br>
 * a tool written by <a href="http://ochafik.com/">Olivier Chafik</a> that <a href="http://code.google.com/p/jnaerator/wiki/CreditsAndLicense">uses a few opensource projects.</a>.<br>
 * For help, please visit <a href="http://nativelibs4java.googlecode.com/">NativeLibs4Java</a> or <a href="http://bridj.googlecode.com/">BridJ</a> .
 */
@Library("hello-world")
@Runtime(CRuntime.class)
public class HelloWorldNativeLibrary {
	static {
		BridJ.register();
	}
	/**
	 * Prints the given string to stdout.<br>
	 * @param str  the characters making up the string<br>
	 * @param len  the length of the string<br>
	 * Original signature : <code>void helloWorld(const int, const char*)</code><br>
	 * <i>native declaration : hello-world-native/src/main/jnaerator/include/hello-world.hpp:9</i>
	 */
	public static void helloWorld(int len, Pointer<Byte > str) {
		helloWorld(len, Pointer.getPeer(str));
	}
	protected native static void helloWorld(int len, @Ptr long str);
}

We’re ready to call this from Scala now! Let’s fire up the REPL and try it out:

scala> import com.curalate.helloworld.HelloWorldNativeLibrary
import com.curalate.helloworld.HelloWorldNativeLibrary

scala> import org.bridj.Pointer.allocateBytes
import org.bridj.Pointer.allocateBytes

scala> import java.nio.charset.Charset
import java.nio.charset.Charset

scala> val message = "Hello World!" // The message we'd like to print.
message: String = "Hello World!"

scala> // It's important that we hold onto a reference to the allocated Pointer until

scala> // the native side returns or it may be freed by the JVM too early leading to a SEGFAULT.

scala> val nativeBytes = allocateBytes(message.size)
nativeBytes: org.bridj.Pointer[Byte] = Pointer(peer = 0x7fab49c7e9b0, targetType = java.lang.Byte, order = LITTLE_ENDIAN)

scala> // Now let's copy the bytes into the natively allocated memory.

scala> nativeBytes.setBytes(message.getBytes(Charset.forName("US-ASCII")))
res3: org.bridj.Pointer[Byte] = Pointer(peer = 0x7fab49c7e9b0, targetType = java.lang.Byte, order = LITTLE_ENDIAN)

scala> HelloWorldNativeLibrary.helloWorld(message.size, nativeBytes)
Hello World!

Let’s take a look at what the final JAR looks like when compilation is complete:

$ jar -tfv target/hello-world-native-0.1.0-SNAPSHOT.jar
     0 Wed Apr 06 14:49:22 EDT 2016 META-INF/
   132 Wed Apr 06 14:49:20 EDT 2016 META-INF/MANIFEST.MF
     0 Wed Apr 06 14:49:20 EDT 2016 com/
     0 Wed Apr 06 14:49:20 EDT 2016 com/curalate/
     0 Wed Apr 06 14:49:20 EDT 2016 com/curalate/helloworld
     0 Wed Apr 06 14:49:18 EDT 2016 lib/
     0 Wed Apr 06 14:49:18 EDT 2016 lib/darwin_universal/
  1130 Wed Apr 06 14:49:20 EDT 2016 com/curalate/helloworld/HelloWorldNativeLibrary.class
  9568 Wed Apr 06 14:49:18 EDT 2016 lib/darwin_universal/libhello-world-native.dylib
     0 Wed Apr 06 14:49:22 EDT 2016 META-INF/maven/
     0 Wed Apr 06 14:49:22 EDT 2016 META-INF/maven/com.curalate/
     0 Wed Apr 06 14:49:22 EDT 2016 META-INF/maven/com.curalate/hello-world-native/
  3505 Wed Apr 06 14:43:48 EDT 2016 META-INF/maven/com.curalate/hello-world-native/pom.xml
   131 Wed Apr 06 14:49:20 EDT 2016 META-INF/maven/com.curalate/hello-world-native/pom.properties

For this example, we compiled this for Mac OS X and the final library is stored in the JAR as lib/darwin_universal/libhello-world-native.dylib. If we also built the Linux library binary we could add it to this JAR as lib/linux_x64/libhello-world-native.so. At runtime BridJ would extract the appropriate library for the class loader allowing the JAR to be used with both Linux and Mac OS X.

Great, now when someone would like to use our code they can simply clone a git repo and type mvn install to get things compiled! At Curalate, we often need to interface with native code when working with machine learning or computer vision. In this post, we’ve given a brief tour of our custom build system that gives us a consistent, fast, and easy-to-use framework for interfacing Scala with C++. Have you had to tackle a problem like this before? If you have suggestions or another approach let us know. We’re always listening!