In this article, we will learn how we can write a short class named
InlineArray
that supports many of the future inline class features. We will also take a look at Speedment HyperStream, an existing Java tool that is using similar means of operation.Background
Since 1995, when it made perfect sense, an array of Objects in Java consists of an array which in turn holds a number of references to other objects which are ultimately spread out on the heap.Here is how an array with two initial
Point
objects is laid out on the heap in Java today:
Array
+======+
|Header|
+------+ Point 0
|ref 0 |---> +======+
+------+ |Header| Point 1
|ref 1 |---- +------+ ---> +======+
+------+ |x | |Header|
|null | +------+ +------+
+------+ |y | |x |
|null | +------+ +------+
+------+ |y |
|... | +------+
+------+
However, over time, the execution pipeline of a typical CPUs has evolved tremendously with an incredible computation performance increase. On the other hand, the speed of light has remained constant and so, the latency of loading data from main memory has unfortunately remained within the same order of magnitude. The balance between computing and retrieving has skewed over in favor of computing.
Accessing main memory these days becomes a thing we want to avoid, much like we wanted to avoid loading data from spinning disks back in the days.
Evidently, the current
Object
array layout implies several drawbacks such as:- Double memory access (due to the indirect reference pointers in the array)
- Reduced locality of data (because array objects are laid out on different places on the heap)
- Increased memory footprint (because all the objects referenced in the array are Objects and therefore holds additional
Class
and synchronization information).
Inline Classes
Within the Java community, there is now a major effort going on to introduce “inline classes” (previously known as “value classes”). The current state of this effort (per July 2019) was presented by Brian Goetz in this video titled “Project Valhalla Update (2019 edition)”. No one knows when this feature will be available in an official Java release. My personal guess is sometime after 2021.Here is how an array of inline
Point
objects would be laid out, once this feature becomes available:
Array
+======+
|Header|
+------+
|x |
+------+
|y |
+------+
|x |
+------+
|y |
+------+
|... |
+------+
As can be seen, this scheme consumes less memory (no
Point
headers), improves locality (data is sequentially laid out in memory) and data can be accessed directly without following indirect reference pointers. On the flip side, we lose the concept of object identity which will be discussed later in this article.Emulating Some Inline Class Properties
In the following, we will implement an emulation of some of the properties of inline classes. Is should be noted that all examples below can be run on standard Java 8 and upwards already now.Assume that we have an
interface Point
with X and Y getters as described here:public interface Point { int x(); int y(); }We could then trivially create an immutable implementation of the
Point
interface as shown hereunder:public final class VanillaPoint implements Point {Further, assume that we are willing to give up the Object/identity properties of
private final int x, y;
public VanillaPoint(int x, int y) {
this.x = x;
this.y = y;
}
@Override public int x() { return x; }
@Override public int y() { return y; }
// toString(), equals() and hashCode() not shown for brevity
}
Point
objects in arrays. This means, among other things, that we cannot synchronize or perform identity operations (such as ==
and System::identityHashCode
)The idea here is to create a memory region that we can work with directly at byte level and flatten out our objects there. This memory region could be encapsulated in a generic class called
InlineArray<T>
like this:public final class InlineArray<T> {
private final ByteBuffer memoryRegion;
private final int elementSize;
private final int length;
private final BiConsumer<ByteBuffer, T> deconstructor;
private final Function<ByteBuffer,T> constructor;
private final BitSet presentFlags;
public InlineArray(
int elementSize,
int length,
BiConsumer<ByteBuffer, T> deconstructor,
Function<ByteBuffer,T> constructor
) {
this.elementSize = elementSize;
this.length = length;
this.deconstructor = requireNonNull(deconstructor);
this.constructor = requireNonNull(constructor);
this.memoryRegion = ByteBuffer.allocateDirect(elementSize * length);
this.presentFlags = new BitSet(length);
}
public void put(int index, T value) {
assertIndexBounds(index);
if (value == null) {
presentFlags.clear(index);
} else {
position(index);
deconstructor.accept(memoryRegion, value);
presentFlags.set(index);
}
}
public T get(int index) {
assertIndexBounds(index);
if (!presentFlags.get(index)) {
return null;
}
position(index);
return constructor.apply(memoryRegion);
}
public int length() {
return length;
}
private void assertIndexBounds(int index) {
if (index < 0 || index >= length) {
throw new IndexOutOfBoundsException("Index [0, " + length + "), was:" + index);
}
}
private void position(int index) {
memoryRegion.position(index * elementSize);
}
}
Note that this class can handle any type of element (of type
T
) than can be deconstructed (serialized) to bytes provided that it has a maximum element size. The class is most efficient if all the elements have the same element size as Point
does (i.e. always Integer.BYTES * 2 = 8
bytes). Further note that the class is not thread-safe, but that this can be added at the expense of introducing a memory barrier and, depending on solution, use separate views of the ByteBuffer
.Now, suppose we want to allocate an array of 10 000 Points. Armed with the new
InlineArray
class we can proceed like this:public class Main {As expected, the code will produce the following output when run:
public static void main(String[] args) {
InlineArray<Point> pointArray = new InlineArray<>(
Integer.BYTES * 2, // The max element size
10_000,
(bb, p) -> {bb.putInt(p.x()); bb.putInt(p.y());},
bb -> new VanillaPoint(bb.getInt(), bb.getInt())
);
Point p0 = new VanillaPoint(0, 0);
Point p1 = new VanillaPoint(1, 1);
pointArray.put(0, p0); // Store p0 at index 0
pointArray.put(1, p1); // Store p1 at index 1
System.out.println(pointArray.get(0)); // Should produce (0, 0)
System.out.println(pointArray.get(1)); // Should produce (1, 1)
System.out.println(pointArray.get(2)); // Should produce null
}
}
VanillaPoint{x=0, y=0}
VanillaPoint{x=1, y=1}
null
Note how we provide an element deconstructor and element constructor to the
InlineArray
telling it how it should deconstruct and construct the Point
objects to and from linear memory.Emulation Properties
The emulation above will probably not get the same performance gains as real inline classes but the savings in terms of memory allocation and locality will be about the same. The emulation above is allocating memory off-heap so your garbage collection times will not be affected by element data put in theInlineArray
. The elements in the ByteBuffer
are laid out just like the proposed inline class array:
Array
+======+
|Header|
+------+
|x |
+------+
|y |
+------+
|x |
+------+
|y |
+------+
|... |
+------+
Because we use
ByteBuffer
objects that are indexed with an int
, the backing memory region becomes limited to 2^31 bytes. This means, for example, we can only put 2^(31-3) = 2^28 ≈ 268 million Point
elements in the array (because each point occupies 2^3 = 8 bytes) before we run out of address space. Real implementations can overcome this limitation by using several ByteBuffers, Unsafe or libraries like Chronicle Bytes.Lazy Entities
Given theInlineArray
class, it is fairly easy to provide elements from the InlineArray
that are lazy, in the sense that they do not have to deserialize all fields eagerly when an element is retrieved from the array. This is how it can be done:First, we create another implementation of the
Point
interface that takes its data from a backing ByteBuffer
itself rather than from local fields:public final class LazyPoint implements Point {
private final ByteBuffer byteBuffer;
private final int position;
public LazyPoint(ByteBuffer byteBuffer) {
this.byteBuffer = byteBuffer;
this.position = byteBuffer.position();
}
@Override
public int x() {
return byteBuffer.getInt(position);
}
@Override
public int y() {
return byteBuffer.getInt(position + Integer.BYTES);
}
// toString(), equals() and hashCode() not shown for brevity
}
Then, we just replace the deserializer pased to the constructor of the
InlineArray
like this:InlineArraypointArray = new InlineArray<>(
Integer.BYTES * 2,
10_000,
(bb, p) -> {bb.putInt(p.x()); bb.putInt(p.y());},
LazyPoint::new // Use this deserializer instead
);
If used in the same main method as above, this will produce the following output:
LazyPoint{x=0, y=0}
LazyPoint{x=1, y=1}
null
Cool. This is particularly useful for entities with tens or even hundreds of fields and where just a limited subset of the fields are ever accessed for the problem at hand.
A drawback with this approach is that if just a single
LazyPoint
reference is retained in our application, it prevents the entire backing ByteBuffer
from being garbage collected. So, any lazy entities like these are best used as short-lived objects.Using Large Collections of Data
What if we want to use very large collections of data (e.g. in the terabytes), perhaps from a database or from files, and store them efficiently in-JVM-memory and then be able to work with these collections to improve computational performance? Can we use this type of technology?Speedment HyperStream is a product that leverages a similar technology to be able to provide database data as standard Java Streams and has been available for some time now. HyperStream lays out data similar to above and can hold terabytes of data in a single JVM with little or no Garbage Collection impact because data is stored off-heap. It can use in-place deserialization to obtain single fields directly from the backing memory region, thereby avoiding unnecessary full deserialization of entities. Its standard Java Streams are deterministic ultra-low latency that can construct and consume streams in under 100 ns in some cases.
Here is an example of how HyperStream (which implements a standard Java Stream) can be used in an application when paging between films. The
Manager films
variable is provided by Speedment automatically:private Stream<Film> getPage(int page, Comparator<Film> comparator) {Even though there might be trillions of films, the method will typically complete in less than a microsecond as the
return films.stream()
.sorted(comparator)
.skip(page * PAGE_SIZE)
.limit(PAGE_SIZE)
}
Stream
is connected directly to RAM and are using in-memory indexes.Read more about Speedment HyperStream performance here.
Evaluate the performance in your own database applications by downloading Speedment HyperStream here.
Resources
Project Valhalla https://openjdk.java.net/projects/valhalla/Speedment HyperStream https://www.speedment.com/hyperstream/
Speedment Initializer https://www.speedment.com/initializer/