Introduce memoizer API for AnnotationMetadata #11970

yawkat · 2025-08-01T17:19:16Z

micronaut-validation makes extensive use of AnnotationMetadata APIs on the hot path, with these calls combined taking 200-300ns even for the simplest validation benchmarks. Some of these computations could be cached in simple Maps in micronaut-validation, but even the hasAnnotation calls add up. These calls are fairly well optimized in core, but still take ~10ns each, for Map access.

This PR adds a new Memoizer API that can move the caching to an efficient data structure inside AnnotationMetadata itself. The API is inspired by FastThreadLocal. metadata.hasAnnotation(MyAnnotation.class) will be replaced by:

private static final MemoizedFlag<AnnotationMetadata> HAS_MY_ANNOTATION = 
                AnnotationMetadata.MEMOIZER_NAMESPACE.newFlag(m -> m.hasAnnotation(MyAnnotation.class));

metadata.getMemoized(HAS_MY_ANNOTATION)

Internally, creating a MemoizedFlag reserves a "slot" in a bit field that stores the memoized value on each AnnotationMetadata. Creating too many MemoizedFlags can lead to higher memory use for every AnnotationMetadata, so MemoizedFlags should be created sparingly and should always be static.

When accessed, the field value is computed lazily and stored in the bit field. Future calls will not have to call hasAnnotation anymore.

For compatibility, the implementation is split into two parts. The Memoizer interface specifies the API, and has default implementations that fall back to computing the default value each time. AbstractMemoizer actually implements the storage using fields. It is only used where necessary, i.e. in DefaultAnnotationMetadata and the EmptyAnnotationMetadata. To make sure the memoized values don't become inconsistent when DefaultAnnotationMetadata is modified, the mutation methods clear the memoization cache.

Performance

Here are some JMH results:

Benchmark                       (annotated)  (type)  Mode  Cnt   Score   Error  Units
MemoBenchmark.direct                   true    Bare  avgt    5   5.691 ± 0.558  ns/op
MemoBenchmark.direct                   true  Extra1  avgt    5  10.485 ± 0.604  ns/op
MemoBenchmark.direct                   true  Extra2  avgt    5  10.907 ± 0.545  ns/op
MemoBenchmark.direct                  false    Bare  avgt    5   0.945 ± 0.078  ns/op
MemoBenchmark.direct                  false  Extra1  avgt    5  14.922 ± 1.386  ns/op
MemoBenchmark.direct                  false  Extra2  avgt    5  15.162 ± 1.475  ns/op
MemoBenchmark.memoized                 true    Bare  avgt    5   1.409 ± 0.027  ns/op
MemoBenchmark.memoized                 true  Extra1  avgt    5   1.443 ± 0.107  ns/op
MemoBenchmark.memoized                 true  Extra2  avgt    5   1.435 ± 0.112  ns/op
MemoBenchmark.memoized                false    Bare  avgt    5   1.432 ± 0.061  ns/op
MemoBenchmark.memoized                false  Extra1  avgt    5   1.398 ± 0.019  ns/op
MemoBenchmark.memoized                false  Extra2  avgt    5   1.351 ± 0.091  ns/op
MemoBenchmark.memoizedFallback         true    Bare  avgt    5   5.580 ± 0.070  ns/op
MemoBenchmark.memoizedFallback         true  Extra1  avgt    5  10.591 ± 0.170  ns/op
MemoBenchmark.memoizedFallback         true  Extra2  avgt    5  13.291 ± 0.253  ns/op
MemoBenchmark.memoizedFallback        false    Bare  avgt    5   1.115 ± 0.011  ns/op
MemoBenchmark.memoizedFallback        false  Extra1  avgt    5  13.521 ± 0.578  ns/op
MemoBenchmark.memoizedFallback        false  Extra2  avgt    5  14.191 ± 0.902  ns/op

The direct benchmark tests a direct call to hasAnnotation, the memoized benchmark the memoized version, and memoizedFallback the default non-cache implementation (i.e. using the Memoizer API but without AbstractMemoizer). The annotated parameter checks whether the field in question is annotated or not, and the type parameter compares versions without additional annotations, with one extra annotation, or with two extra annotations. Extra annotations affect the efficiency of the direct hasAnnotation call, since the backing Maps become larger.

In the results you can see that the fallback option (no caching) is only very slightly slower (~0.5ns) than the normal direct call. This gives us assurance that even in edge cases where no memoization is available, performance won't suffer when using the Memoizer API.

The AbstractMemoization implementation (with caching) takes a consistent <1.5ns, making it faster than the direct call to hasAnnotation in almost all cases – excepting the case where the field is not annotated at all.

micronaut-validation makes extensive use of AnnotationMetadata APIs on the hot path, with these calls combined taking 200-300ns even for the simplest validation benchmarks. Some of these computations could be cached in simple Maps in micronaut-validation, but even the hasAnnotation calls add up. These calls are fairly well optimized in core, but still take ~10ns each, for Map access. This PR adds a new Memoizer API that can move the caching to an efficient data structure inside AnnotationMetadata itself. The API is inspired by FastThreadLocal. `metadata.hasAnnotation(MyAnnotation.class)` will be replaced by: ```java private static final MemoizedFlag<AnnotationMetadata> HAS_MY_ANNOTATION = AnnotationMetadata.MEMOIZER_NAMESPACE.newFlag(m -> m.hasAnnotation(MyAnnotation.class)); metadata.getMemoized(HAS_MY_ANNOTATION) ``` Internally, creating a `MemoizedFlag` reserves a "slot" in a bit field that stores the memoized value on each AnnotationMetadata. Creating too many MemoizedFlags can lead to higher memory use for *every* AnnotationMetadata, so MemoizedFlags should be created sparingly and should always be `static`. When accessed, the field value is computed lazily and stored in the bit field. Future calls will not have to call `hasAnnotation` anymore. For compatibility, the implementation is split into two parts. The Memoizer interface specifies the API, and has default implementations that fall back to computing the default value each time. AbstractMemoizer actually implements the storage using fields. It is only used where necessary, i.e. in DefaultAnnotationMetadata and the EmptyAnnotationMetadata. To make sure the memoized values don't become inconsistent when DefaultAnnotationMetadata is modified, the mutation methods clear the memoization cache. ## Performance Here are some JMH results: ``` Benchmark (annotated) (type) Mode Cnt Score Error Units MemoBenchmark.direct true Bare avgt 5 5.691 ± 0.558 ns/op MemoBenchmark.direct true Extra1 avgt 5 10.485 ± 0.604 ns/op MemoBenchmark.direct true Extra2 avgt 5 10.907 ± 0.545 ns/op MemoBenchmark.direct false Bare avgt 5 0.945 ± 0.078 ns/op MemoBenchmark.direct false Extra1 avgt 5 14.922 ± 1.386 ns/op MemoBenchmark.direct false Extra2 avgt 5 15.162 ± 1.475 ns/op MemoBenchmark.memoized true Bare avgt 5 1.409 ± 0.027 ns/op MemoBenchmark.memoized true Extra1 avgt 5 1.443 ± 0.107 ns/op MemoBenchmark.memoized true Extra2 avgt 5 1.435 ± 0.112 ns/op MemoBenchmark.memoized false Bare avgt 5 1.432 ± 0.061 ns/op MemoBenchmark.memoized false Extra1 avgt 5 1.398 ± 0.019 ns/op MemoBenchmark.memoized false Extra2 avgt 5 1.351 ± 0.091 ns/op MemoBenchmark.memoizedFallback true Bare avgt 5 5.580 ± 0.070 ns/op MemoBenchmark.memoizedFallback true Extra1 avgt 5 10.591 ± 0.170 ns/op MemoBenchmark.memoizedFallback true Extra2 avgt 5 13.291 ± 0.253 ns/op MemoBenchmark.memoizedFallback false Bare avgt 5 1.115 ± 0.011 ns/op MemoBenchmark.memoizedFallback false Extra1 avgt 5 13.521 ± 0.578 ns/op MemoBenchmark.memoizedFallback false Extra2 avgt 5 14.191 ± 0.902 ns/op ``` The `direct` benchmark tests a direct call to `hasAnnotation`, the `memoized` benchmark the memoized version, and `memoizedFallback` the default non-cache implementation (i.e. using the Memoizer API but without AbstractMemoizer). The `annotated` parameter checks whether the field in question is annotated or not, and the `type` parameter compares versions without additional annotations, with one extra annotation, or with two extra annotations. Extra annotations affect the efficiency of the direct `hasAnnotation` call, since the backing Maps become larger. In the results you can see that the fallback option (no caching) is only very slightly slower (~0.5ns) than the normal direct call. This gives us assurance that even in edge cases where no memoization is available, performance won't suffer when using the Memoizer API. The `AbstractMemoization` implementation (with caching) takes a consistent <1.5ns, making it faster than the direct call to hasAnnotation in almost all cases – excepting the case where the field is not annotated at all.

graemerocher

PR would be easier to review and understand if it included examples of optimisations applied to the codebase. At the moment it is unclear how this will be used in practise.

graemerocher · 2025-08-01T21:52:19Z

core/src/main/java/io/micronaut/core/annotation/AnnotationMetadata.java

     */
    AnnotationMetadata EMPTY_METADATA = new EmptyAnnotationMetadata();

+    MemoizerNamespace<AnnotationMetadata> MEMOIZER_NAMESPACE = MemoizerNamespace.create();


add javadoc

Mutable data structure in a static field?

Yes it's mutable, that's why it's important to only create a limited number of MemoizedReferences.

the concern is that users could fiddle with this mutable static fields and create weird bugs

graemerocher · 2025-08-01T21:53:12Z

core/src/main/java/io/micronaut/core/util/memo/AbstractMemoizer.java

+
+    static {
+        try {
+            ITEMS_FIELD = MethodHandles.lookup().findVarHandle(AbstractMemoizer.class, "items", Object[].class);


does this work on Graal?

graemerocher · 2025-08-01T21:53:52Z

core/src/main/java/io/micronaut/core/util/memo/MemoizedFlag.java

+    private MemoizedFlag() {
+    }
+
+    abstract boolean compute(M memoizer);


dstepanov · 2025-08-04T07:18:07Z

IMHO This is overkill. From the user prespective you don't know if you should be using this or calling methods on the annotation metadata.

What is the use-case we want to optimize? If it's a simple hasAnnotation or hasStereotype we can optimize it by caching last annotation checked like String lastHasAnnotationTrue, lastHasAnnotationFalse or try to add something like StringIntMap.

yawkat · 2025-08-04T09:21:52Z

@dstepanov I don't believe other optimizations can work as well as this can. Even a very simple map takes some time for item access. Caching the last accessed annotation type might work for simple benchmarks, but validation doesn't only access a single annotation per metadata.

In comparison, the Memoizer will in the ideal case simply access a long field and do two bitwise comparisons. It is hard to beat.

A side benefit is that the memoizer can also be used to implement more complex caching such as this one. That commit wouldn't need the COW map, it can use MemoizedReference instead.

yawkat · 2025-08-04T09:25:01Z

On further testing in validation, I had to modify the API of this PR. ( da3c1f9 )

It turns out that even the simple .getMemoized call can lead to an interface table lookup when the call site acts on multiple different types, even if all those types extend AbstractMemoizer. The new implementation uses an instanceof AbstractMemoizer instead, which does not involve the itable lookup. This does decrease performance in the MemoBenchmark a bit though:

Benchmark                       (annotated)  (type)  Mode  Cnt   Score   Error  Units
MemoBenchmark.direct                   true    Bare  avgt    5   5.636 ± 0.084  ns/op
MemoBenchmark.direct                   true  Extra1  avgt    5   9.666 ± 0.315  ns/op
MemoBenchmark.direct                   true  Extra2  avgt    5  10.034 ± 0.215  ns/op
MemoBenchmark.direct                  false    Bare  avgt    5   0.907 ± 0.040  ns/op
MemoBenchmark.direct                  false  Extra1  avgt    5  13.867 ± 0.575  ns/op
MemoBenchmark.direct                  false  Extra2  avgt    5  14.688 ± 0.243  ns/op
MemoBenchmark.memoized                 true    Bare  avgt    5   2.481 ± 0.067  ns/op
MemoBenchmark.memoized                 true  Extra1  avgt    5   2.477 ± 0.047  ns/op
MemoBenchmark.memoized                 true  Extra2  avgt    5   2.501 ± 0.118  ns/op
MemoBenchmark.memoized                false    Bare  avgt    5   2.697 ± 0.009  ns/op
MemoBenchmark.memoized                false  Extra1  avgt    5   2.711 ± 0.072  ns/op
MemoBenchmark.memoized                false  Extra2  avgt    5   2.723 ± 0.131  ns/op
MemoBenchmark.memoizedFallback         true    Bare  avgt    5   6.402 ± 0.189  ns/op
MemoBenchmark.memoizedFallback         true  Extra1  avgt    5  11.204 ± 0.645  ns/op
MemoBenchmark.memoizedFallback         true  Extra2  avgt    5  11.181 ± 0.638  ns/op
MemoBenchmark.memoizedFallback        false    Bare  avgt    5   1.128 ± 0.029  ns/op
MemoBenchmark.memoizedFallback        false  Extra1  avgt    5  14.541 ± 0.708  ns/op
MemoBenchmark.memoizedFallback        false  Extra2  avgt    5  15.458 ± 0.559  ns/op

Depends on micronaut-projects/micronaut-core#11970 . Saves a further ~40% of runtime in ParameterBenchmark

sonarqubecloud · 2025-08-06T16:20:01Z

Quality Gate failed

Failed conditions
1 New Blocker Issues (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

dstepanov · 2025-08-07T06:48:40Z

Personally I'm agains this changes:

It introduces a duplicate concept how to check for something. It's confusing why you would need to have something memorized and it's not performant by default
I find the whole memorizing API complicated: namespace, references, flags, fallback
Also I don't like that it includes a shared static state
Considering the AnnotationMetadata is a backbone of Micronaut I don't feel confident adding something like that there

This might be a good idea but it should be a separate concept.

graemerocher · 2025-08-18T14:33:49Z

I think I am in agreement with @dstepanov in that the API is confusing and the shared static state is a problem that could lead to complicated to debug issues down the line. @dstepanov can you investigate if there is a way to achieve similar performance improvements with a simpler API?

dstepanov · 2025-08-18T14:59:32Z

I think we might want to introduce a specific cache per method connected to the proxy. Instead of having ConcurrentHashMap in multiple interceptors we can have one cache (method->data) probably linked to the proxy.

The syntax would be something like methodInvocationContext.getCached(RepositoryInterceptor.Key, this::init)

Where RepositoryInterceptor.Key whould implement some key interface in the same way as this PR does.

This will allow us to cache runtime date for data repository, validation, configuration etc.

yawkat requested review from graemerocher and dstepanov August 1, 2025 17:19

yawkat added this to the 4.10.0 milestone Aug 1, 2025

yawkat added this to 4.10.0 Release Aug 1, 2025

yawkat added the type: improvement A minor improvement to an existing feature label Aug 1, 2025

apply to AnnotationMetadataHierarchy too

c4f38fe

graemerocher requested changes Aug 1, 2025

View reviewed changes

yawkat marked this pull request as draft August 4, 2025 07:43

Add itable-less getter for testing

ae0d66e

yawkat mentioned this pull request Aug 4, 2025

Validator optimizations micronaut-projects/micronaut-validation#539

Merged

flesh out API change a bit

da3c1f9

yawkat marked this pull request as ready for review August 4, 2025 08:51

yawkat added a commit to micronaut-projects/micronaut-validation that referenced this pull request Aug 4, 2025

Memoizer optimizations

6d13d28

Depends on micronaut-projects/micronaut-core#11970 . Saves a further ~40% of runtime in ParameterBenchmark

yawkat mentioned this pull request Aug 4, 2025

Memoizer optimizations micronaut-projects/micronaut-validation#540

Draft

yawkat added 3 commits August 5, 2025 08:52

CR

c304761

Remove unnecessary acquire release

b3f80c0

acquire release array copy

f307699

Use invokeExact

e6c3454

yawkat added this to 5.0.0 Release Oct 1, 2025

yawkat removed this from 4.10.0 Release Oct 1, 2025

yawkat modified the milestones: 4.10.0, 5.0.0 Oct 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce memoizer API for AnnotationMetadata #11970

Introduce memoizer API for AnnotationMetadata #11970

Uh oh!

yawkat commented Aug 1, 2025

Uh oh!

graemerocher left a comment

Uh oh!

graemerocher Aug 1, 2025

Uh oh!

dstepanov Aug 4, 2025

Uh oh!

yawkat Aug 4, 2025

Uh oh!

graemerocher Aug 6, 2025

Uh oh!

graemerocher Aug 1, 2025

Uh oh!

graemerocher Aug 1, 2025

Uh oh!

dstepanov commented Aug 4, 2025

Uh oh!

yawkat commented Aug 4, 2025

Uh oh!

yawkat commented Aug 4, 2025

Uh oh!

sonarqubecloud bot commented Aug 6, 2025

Uh oh!

dstepanov commented Aug 7, 2025

Uh oh!

graemerocher commented Aug 18, 2025

Uh oh!

dstepanov commented Aug 18, 2025

Uh oh!

Uh oh!

Introduce memoizer API for AnnotationMetadata #11970

Are you sure you want to change the base?

Introduce memoizer API for AnnotationMetadata #11970

Uh oh!

Conversation

yawkat commented Aug 1, 2025

Performance

Uh oh!

graemerocher left a comment

Choose a reason for hiding this comment

Uh oh!

graemerocher Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

dstepanov Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

yawkat Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

graemerocher Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

graemerocher Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

graemerocher Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

dstepanov commented Aug 4, 2025

Uh oh!

yawkat commented Aug 4, 2025

Uh oh!

yawkat commented Aug 4, 2025

Uh oh!

sonarqubecloud bot commented Aug 6, 2025

Quality Gate failed

Uh oh!

dstepanov commented Aug 7, 2025

Uh oh!

graemerocher commented Aug 18, 2025

Uh oh!

dstepanov commented Aug 18, 2025

Uh oh!

Uh oh!