Java: Benefit from Inline Class Properties Starting from Java 8

by Per Minborg

on August 5, 2019

In some years, Java will hopefully have an “inline class" feature which solves many challenges with the current state of Java. Read this article and learn how to use Java 8 and upwards today, and still benefit from some of the advantages of the coming inline object arrays such as; no indirect pointers, eliminated object header overhead and improved data locality.

In this article, we will learn how we can write a short class named InlineArray that supports many of the future inline class features. We will also take a look at Speedment HyperStream, an existing Java tool that is using similar means of operation.

Background

Since 1995, when it made perfect sense, an array of Objects in Java consists of an array which in turn holds a number of references to other objects which are ultimately spread out on the heap.

Here is how an array with two initial Point objects is laid out on the heap in Java today:

Array
+======+
|Header|
+------+ Point 0
|ref 0 |---> +======+
+------+ |Header| Point 1
|ref 1 |---- +------+ ---> +======+
+------+ |x | |Header|
|null | +------+ +------+
+------+ |y | |x |
|null | +------+ +------+
+------+ |y |
|... | +------+
+------+

However, over time, the execution pipeline of a typical CPUs has evolved tremendously with an incredible computation performance increase. On the other hand, the speed of light has remained constant and so, the latency of loading data from main memory has unfortunately remained within the same order of magnitude. The balance between computing and retrieving has skewed over in favor of computing.

Accessing main memory these days becomes a thing we want to avoid, much like we wanted to avoid loading data from spinning disks back in the days.

Evidently, the current Object array layout implies several drawbacks such as:

  • Double memory access (due to the indirect reference pointers in the array)
  • Reduced locality of data (because array objects are laid out on different places on the heap)
  • Increased memory footprint (because all the objects referenced in the array are Objects and therefore holds additional Class and synchronization information).


Inline Classes

Within the Java community, there is now a major effort going on to introduce “inline classes” (previously known as “value classes”). The current state of this effort (per July 2019) was presented by Brian Goetz in this video titled “Project Valhalla Update (2019 edition)”. No one knows when this feature will be available in an official Java release. My personal guess is sometime after 2021.

Here is how an array of inline Point objects would be laid out, once this feature becomes available:


Array
+======+
|Header|
+------+
|x |
+------+
|y |
+------+
|x |
+------+
|y |
+------+
|... |
+------+

As can be seen, this scheme consumes less memory (no Point headers), improves locality (data is sequentially laid out in memory) and data can be accessed directly without following indirect reference pointers. On the flip side, we lose the concept of object identity which will be discussed later in this article.

Emulating Some Inline Class Properties

In the following, we will implement an emulation of some of the properties of inline classes. Is should be noted that all examples below can be run on standard Java 8 and upwards already now.

Assume that we have an interface Point with X and Y getters as described here:
public interface Point { int x(); int y(); }
We could then trivially create an immutable implementation of the Point interface as shown hereunder:

public final class VanillaPoint implements Point {

private final int x, y;

public VanillaPoint(int x, int y) {
this.x = x;
this.y = y;
}

@Override public int x() { return x; }

@Override public int y() { return y; }

// toString(), equals() and hashCode() not shown for brevity

}
Further, assume that we are willing to give up the Object/identity properties of Point objects in arrays. This means, among other things, that we cannot synchronize or perform identity operations (such as == and System::identityHashCode)

The idea here is to create a memory region that we can work with directly at byte level and flatten out our objects there. This memory region could be encapsulated in a generic class called InlineArray<T> like this:

public final class InlineArray<T> {

private final ByteBuffer memoryRegion;
private final int elementSize;
private final int length;
private final BiConsumer<ByteBuffer, T> deconstructor;
private final Function<ByteBuffer,T> constructor;
private final BitSet presentFlags;

public InlineArray(
int elementSize,
int length,
BiConsumer<ByteBuffer, T> deconstructor,
Function<ByteBuffer,T> constructor
) {
this.elementSize = elementSize;
this.length = length;
this.deconstructor = requireNonNull(deconstructor);
this.constructor = requireNonNull(constructor);
this.memoryRegion = ByteBuffer.allocateDirect(elementSize * length);
this.presentFlags = new BitSet(length);
}

public void put(int index, T value) {
assertIndexBounds(index);
if (value == null) {
presentFlags.clear(index);
} else {
position(index);
deconstructor.accept(memoryRegion, value);
presentFlags.set(index);
}
}

public T get(int index) {
assertIndexBounds(index);
if (!presentFlags.get(index)) {
return null;
}
position(index);
return constructor.apply(memoryRegion);
}

public int length() {
return length;
}

private void assertIndexBounds(int index) {
if (index < 0 || index >= length) {
throw new IndexOutOfBoundsException("Index [0, " + length + "), was:" + index);
}
}

private void position(int index) {
memoryRegion.position(index * elementSize);
}

}

Note that this class can handle any type of element (of type T) than can be deconstructed (serialized) to bytes provided that it has a maximum element size. The class is most efficient if all the elements have the same element size as Point does (i.e. always Integer.BYTES * 2 = 8 bytes). Further note that the class is not thread-safe, but that this can be added at the expense of introducing a memory barrier and, depending on solution, use separate views of the ByteBuffer.

Now, suppose we want to allocate an array of 10 000 Points. Armed with the new InlineArray class we can proceed like this:

public class Main {

public static void main(String[] args) {

InlineArray<Point> pointArray = new InlineArray<>(
Integer.BYTES * 2, // The max element size
10_000,
(bb, p) -> {bb.putInt(p.x()); bb.putInt(p.y());},
bb -> new VanillaPoint(bb.getInt(), bb.getInt())
);

Point p0 = new VanillaPoint(0, 0);
Point p1 = new VanillaPoint(1, 1);

pointArray.put(0, p0); // Store p0 at index 0
pointArray.put(1, p1); // Store p1 at index 1

System.out.println(pointArray.get(0)); // Should produce (0, 0)
System.out.println(pointArray.get(1)); // Should produce (1, 1)
System.out.println(pointArray.get(2)); // Should produce null

}

}
As expected, the code will produce the following output when run:

VanillaPoint{x=0, y=0}
VanillaPoint{x=1, y=1}
null

Note how we provide an element deconstructor and element constructor to the InlineArray telling it how it should deconstruct and construct the Point objects to and from linear memory.

Emulation Properties

The emulation above will probably not get the same performance gains as real inline classes but the savings in terms of memory allocation and locality will be about the same. The emulation above is allocating memory off-heap so your garbage collection times will not be affected by element data put in the InlineArray. The elements in the ByteBuffer are laid out just like the proposed inline class array:

Array
+======+
|Header|
+------+
|x |
+------+
|y |
+------+
|x |
+------+
|y |
+------+
|... |
+------+

Because we use ByteBuffer objects that are indexed with an int, the backing memory region becomes limited to 2^31 bytes. This means, for example, we can only put 2^(31-3) = 2^28 ≈ 268 million Point elements in the array (because each point occupies 2^3 = 8 bytes) before we run out of address space. Real implementations can overcome this limitation by using several ByteBuffers, Unsafe or libraries like Chronicle Bytes.

Lazy Entities

Given the InlineArray class, it is fairly easy to provide elements from the InlineArray that are lazy, in the sense that they do not have to deserialize all fields eagerly when an element is retrieved from the array. This is how it can be done:

First, we create another implementation of the Point interface that takes its data from a backing ByteBuffer itself rather than from local fields:

public final class LazyPoint implements Point {

private final ByteBuffer byteBuffer;
private final int position;

public LazyPoint(ByteBuffer byteBuffer) {
this.byteBuffer = byteBuffer;
this.position = byteBuffer.position();
}

@Override
public int x() {
return byteBuffer.getInt(position);
}

@Override
public int y() {
return byteBuffer.getInt(position + Integer.BYTES);
}

// toString(), equals() and hashCode() not shown for brevity

}

Then, we just replace the deserializer pased to the constructor of the InlineArray like this:

InlineArray pointArray = new InlineArray<>(
Integer.BYTES * 2,
10_000,
(bb, p) -> {bb.putInt(p.x()); bb.putInt(p.y());},
LazyPoint::new // Use this deserializer instead
);

If used in the same main method as above, this will produce the following output:

LazyPoint{x=0, y=0}
LazyPoint{x=1, y=1}
null

Cool. This is particularly useful for entities with tens or even hundreds of fields and where just a limited subset of the fields are ever accessed for the problem at hand.

A drawback with this approach is that if just a single LazyPoint reference is retained in our application, it prevents the entire backing ByteBuffer from being garbage collected. So, any lazy entities like these are best used as short-lived objects.

Using Large Collections of Data

What if we want to use very large collections of data (e.g. in the terabytes), perhaps from a database or from files, and store them efficiently in-JVM-memory and then be able to work with these collections to improve computational performance? Can we use this type of technology?

Speedment HyperStream is a product that leverages a similar technology to be able to provide database data as standard Java Streams and has been available for some time now. HyperStream lays out data similar to above and can hold terabytes of data in a single JVM with little or no Garbage Collection impact because data is stored off-heap. It can use in-place deserialization to obtain single fields directly from the backing memory region, thereby avoiding unnecessary full deserialization of entities. Its standard Java Streams are deterministic ultra-low latency that can construct and consume streams in under 100 ns in some cases.

Here is an example of how HyperStream (which implements a standard Java Stream) can be used in an application when paging between films. The Manager films variable is provided by Speedment automatically:


private Stream<Film> getPage(int page, Comparator<Film> comparator) {
return films.stream()
.sorted(comparator)
.skip(page * PAGE_SIZE)
.limit(PAGE_SIZE)
}
Even though there might be trillions of films, the method will typically complete in less than a microsecond as the Stream is connected directly to RAM and are using in-memory indexes.

Read more about Speedment HyperStream performance here.

Evaluate the performance in your own database applications by downloading Speedment HyperStream here.

Resources

Project Valhalla https://openjdk.java.net/projects/valhalla/
Speedment HyperStream https://www.speedment.com/hyperstream/
Speedment Initializer https://www.speedment.com/initializer/

About

Per Minborg

Per Minborg is a Palo Alto based developer and architect, currently serving as CTO at Speedment, Inc. He is a regular speaker at various conferences e.g. JavaOne, DevNexus, Jdays, JUGs and Meetups. Per has 15+ US patent applications and invention disclosures. He is a JavaOne alumni and co-author of the publication “Modern Java”.