Java: ChronicleMap Part 1, Go Off-Heap

by Per Minborg

on July 26, 2019

Filling up a HashMap with millions of objects will quickly lead to problems such as inefficient memory usage, low performance and garbage collection problems. Learn how to use off-heap CronicleMap that can contain billions of objects with little or no heap impact.

The built-in Map implementations, such as HashMap and ConcurrentHashMap are excellent tools when we want to work with small to medium-sized data sets. However, as the amount of data grows, these Map implementations are deteriorating and start to exhibit a number of unpleasant drawbacks as shown in this first article in an article series about open-sourceed  CronicleMap.

Heap Allocation

In the examples below, we will use Point objects. Point is a POJO with a public default constructor and getters and setters for X and Y properties (int). The following snippet adds a million Point objects to a HashMap:

final Map<Long, Point> m = LongStream.range(0, 1_000_000)
.boxed()
.collect(
toMap(
Function.identity(),
FillMaps::pointFrom,
(u,v) -> { throw new IllegalStateException(); },
HashMap::new
)
);

// Conveniency method that creates a Point from
// a long by applying modulo prime number operations
private static Point pointFrom(long seed) {
final Point point = new Point();
point.setX((int) seed % 4517);
point.setY((int) seed % 5011);
return point;
}

We can easily see the number of objects allocated on the heap and how much heap memory these objects consume:

Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34366 | head
num #instances #bytes class name (module)
-------------------------------------------------------
1: 1002429 32077728 java.util.HashMap$Node (java.base@10)
2: 1000128 24003072 java.lang.Long (java.base@10)
3: 1000000 24000000 com.speedment.chronicle.test.map.Point
4: 454 8434256 [Ljava.util.HashMap$Node; (java.base@10)
5: 3427 870104 [B (java.base@10)
6: 185 746312 [I (java.base@10)
7: 839 102696 java.lang.Class (java.base@10)
8: 1164 89088 [Ljava.lang.Object; (java.base@10)
For each Map entry, a Long, a HashMap$Node and a Point object need to be created on the heap. There are also a number of arrays with HashMap$Node objects created. In total, these objects and arrays consume 88,515,056 bytes of heap memory. Thus, each entry consumes on average 88.5 bytes.

NB: The extra 2429 HashMap$Node objects come from other HashMap objects used internally by Java.

Off-Heap Allocation

Contrary to this, a CronicleMap uses very little heap memory as can be observed when running the following code:

final Map<Long, Point> m2 = LongStream.range(0, 1_000_000)
.boxed()
.collect(
toMap(
Function.identity(),
FillMaps::pointFrom,
(u,v) -> { throw new IllegalStateException(); },
() -> ChronicleMap
.of(Long.class, Point.class)
.averageValueSize(8)
.valueMarshaller(PointSerializer.getInstance())
.entries(1_000_000)
.create()
)
);
Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34413 | head
num #instances #bytes class name (module)
-------------------------------------------------------
1: 6537 1017768 [B (java.base@10)
2: 448 563936 [I (java.base@10)
3: 1899 227480 java.lang.Class (java.base@10)
4: 6294 151056 java.lang.String (java.base@10)
5: 2456 145992 [Ljava.lang.Object; (java.base@10)
6: 3351 107232 java.util.concurrent.ConcurrentHashMap$Node (java.base@10)
7: 2537 81184 java.util.HashMap$Node (java.base@10)
8: 512 49360 [Ljava.util.HashMap$Node; (java.base@10)
As can be seen, there are no Java heap objects allocated for the CronicleMap entries and consequently no heap memory either.

Instead of allocating heap memory, CronicleMap allocates its memory off-heap. Provided that we start our JVM with the flag -XX:NativeMemoryTracking=summary, we can retrieve the amount off-heap memory being used by issuing the following command:

Pers-MacBook-Pro:chronicle-test pemi$ jcmd 34413 VM.native_memory | grep Internal
- Internal (reserved=30229KB, committed=30229KB)
Apparently, our one million objects were laid out in off-heap memory using a little more than 30 MB of off-heap RAM. This means that each entry in the CronicleMap used above needs on average 30 bytes.

This is much more memory effective than a HashMap that required 88.5 bytes. In fact, we saved 66% of RAM memory and almost 100% of heap memory. The latter is important because the Java Garbage Collector only sees objects that are on the heap.

Note that we have to decide upon creation how many entries the CronicleMap can hold at maximum. This is different compared to HashMap which can grow dynamically as we add new associations. We also have to provide a serializer (i.e. PointSerializer.getInstance()), which will be discussed in detail later in this article.

Garbage Collection

Many Garbage Collection (GC) algorithms complete in a time that is proportional to the square of objects that exist on the heap. So if we, for example, double the number of objects on the heap, we can expect the GC would take four times longer to complete.

If we, on the other hand, create 64 times more objects, we can expect to suffer an agonizing 1,024 fold increase in expected GC time. This effectively prevents us from ever being able to create really large HashMap objects.

With ChronicleMap we could just put new associations without any concern of garbage collection times.

Serializer

The mediator between heap and off-heap memory is often called a serializer. ChronicleMap comes with a number of pre-configured serializers for most built-in Java types such as Integer, Long, String and many more.

In the example above, we used a custom serializer that was used to convert a Point back and forth between heap and off-heap memory. The serializer class looks like this:

public final class PointSerializer implements
SizedReader<Point>,
SizedWriter<Point> {

private static PointSerializer INSTANCE = new PointSerializer();

public static PointSerializer getInstance() { return INSTANCE; }

private PointSerializer() {}

@Override
public long size(@NotNull Point toWrite) {
return Integer.BYTES * 2;
}

@Override
public void write(Bytes out, long size, @NotNull Point point) {
out.writeInt(point.getX());
out.writeInt(point.getY());
}

@NotNull
@Override
public Point read(Bytes in, long size, @Nullable Point using) {
if (using == null) {
using = new Point();
}
using.setX(in.readInt());
using.setY(in.readInt());
return using;
}

}
The serializer above is implemented as a stateless singleton and the actual serialization in the methods write() and read() are fairly straight forward. The only tricky part is that we need to have a null check in the read() method if the “using” variable does not reference an instantiated/reused object.

How to Install it?

When we want to use ChronicleMap in our project, we just add the following Maven dependency in our pom.xml file and we have access to the library.

<dependency>
<groupId>net.openhft</groupId>
<artifactId>chronicle-map</artifactId>
<version>3.17.3</version>
</dependency>
If you are using another build tool, for example, Gradle, you can see how to depend on ChronicleMap by clicking this link.

The Short Story

Here are some properties of ChronicleMap:

Stores data off-heap
Is almost always more memory efficient than a HashMap
Implements ConcurrentMap
Does not affect garbage collection times
Sometimes needs a serializer
Has a fixed max entry size
Can hold billions of associations
Is free and open-source

About

Per Minborg

Per Minborg is a Palo Alto based developer and architect, currently serving as CTO at Speedment, Inc. He is a regular speaker at various conferences e.g. JavaOne, DevNexus, Jdays, JUGs and Meetups. Per has 15+ US patent applications and invention disclosures. He is a JavaOne alumni and co-author of the publication “Modern Java”.