Skip to content

Conversation

yawkat
Copy link
Member

@yawkat yawkat commented Aug 1, 2025

micronaut-validation makes extensive use of AnnotationMetadata APIs on the hot path, with these calls combined taking 200-300ns even for the simplest validation benchmarks. Some of these computations could be cached in simple Maps in micronaut-validation, but even the hasAnnotation calls add up. These calls are fairly well optimized in core, but still take ~10ns each, for Map access.

This PR adds a new Memoizer API that can move the caching to an efficient data structure inside AnnotationMetadata itself. The API is inspired by FastThreadLocal. metadata.hasAnnotation(MyAnnotation.class) will be replaced by:

private static final MemoizedFlag<AnnotationMetadata> HAS_MY_ANNOTATION = 
                AnnotationMetadata.MEMOIZER_NAMESPACE.newFlag(m -> m.hasAnnotation(MyAnnotation.class));

metadata.getMemoized(HAS_MY_ANNOTATION)

Internally, creating a MemoizedFlag reserves a "slot" in a bit field that stores the memoized value on each AnnotationMetadata. Creating too many MemoizedFlags can lead to higher memory use for every AnnotationMetadata, so MemoizedFlags should be created sparingly and should always be static.

When accessed, the field value is computed lazily and stored in the bit field. Future calls will not have to call hasAnnotation anymore.

For compatibility, the implementation is split into two parts. The Memoizer interface specifies the API, and has default implementations that fall back to computing the default value each time. AbstractMemoizer actually implements the storage using fields. It is only used where necessary, i.e. in DefaultAnnotationMetadata and the EmptyAnnotationMetadata. To make sure the memoized values don't become inconsistent when DefaultAnnotationMetadata is modified, the mutation methods clear the memoization cache.

Performance

Here are some JMH results:

Benchmark                       (annotated)  (type)  Mode  Cnt   Score   Error  Units
MemoBenchmark.direct                   true    Bare  avgt    5   5.691 ± 0.558  ns/op
MemoBenchmark.direct                   true  Extra1  avgt    5  10.485 ± 0.604  ns/op
MemoBenchmark.direct                   true  Extra2  avgt    5  10.907 ± 0.545  ns/op
MemoBenchmark.direct                  false    Bare  avgt    5   0.945 ± 0.078  ns/op
MemoBenchmark.direct                  false  Extra1  avgt    5  14.922 ± 1.386  ns/op
MemoBenchmark.direct                  false  Extra2  avgt    5  15.162 ± 1.475  ns/op
MemoBenchmark.memoized                 true    Bare  avgt    5   1.409 ± 0.027  ns/op
MemoBenchmark.memoized                 true  Extra1  avgt    5   1.443 ± 0.107  ns/op
MemoBenchmark.memoized                 true  Extra2  avgt    5   1.435 ± 0.112  ns/op
MemoBenchmark.memoized                false    Bare  avgt    5   1.432 ± 0.061  ns/op
MemoBenchmark.memoized                false  Extra1  avgt    5   1.398 ± 0.019  ns/op
MemoBenchmark.memoized                false  Extra2  avgt    5   1.351 ± 0.091  ns/op
MemoBenchmark.memoizedFallback         true    Bare  avgt    5   5.580 ± 0.070  ns/op
MemoBenchmark.memoizedFallback         true  Extra1  avgt    5  10.591 ± 0.170  ns/op
MemoBenchmark.memoizedFallback         true  Extra2  avgt    5  13.291 ± 0.253  ns/op
MemoBenchmark.memoizedFallback        false    Bare  avgt    5   1.115 ± 0.011  ns/op
MemoBenchmark.memoizedFallback        false  Extra1  avgt    5  13.521 ± 0.578  ns/op
MemoBenchmark.memoizedFallback        false  Extra2  avgt    5  14.191 ± 0.902  ns/op

The direct benchmark tests a direct call to hasAnnotation, the memoized benchmark the memoized version, and memoizedFallback the default non-cache implementation (i.e. using the Memoizer API but without AbstractMemoizer). The annotated parameter checks whether the field in question is annotated or not, and the type parameter compares versions without additional annotations, with one extra annotation, or with two extra annotations. Extra annotations affect the efficiency of the direct hasAnnotation call, since the backing Maps become larger.

In the results you can see that the fallback option (no caching) is only very slightly slower (~0.5ns) than the normal direct call. This gives us assurance that even in edge cases where no memoization is available, performance won't suffer when using the Memoizer API.

The AbstractMemoization implementation (with caching) takes a consistent <1.5ns, making it faster than the direct call to hasAnnotation in almost all cases – excepting the case where the field is not annotated at all.

micronaut-validation makes extensive use of AnnotationMetadata APIs on the hot path, with these calls combined taking 200-300ns even for the simplest validation benchmarks. Some of these computations could be cached in simple Maps in micronaut-validation, but even the hasAnnotation calls add up. These calls are fairly well optimized in core, but still take ~10ns each, for Map access.

This PR adds a new Memoizer API that can move the caching to an efficient data structure inside AnnotationMetadata itself. The API is inspired by FastThreadLocal. `metadata.hasAnnotation(MyAnnotation.class)` will be replaced by:

```java
private static final MemoizedFlag<AnnotationMetadata> HAS_MY_ANNOTATION = AnnotationMetadata.MEMOIZER_NAMESPACE.newFlag(m -> m.hasAnnotation(MyAnnotation.class));

metadata.getMemoized(HAS_MY_ANNOTATION)
```

Internally, creating a `MemoizedFlag` reserves a "slot" in a bit field that stores the memoized value on each AnnotationMetadata. Creating too many MemoizedFlags can lead to higher memory use for *every* AnnotationMetadata, so MemoizedFlags should be created sparingly and should always be `static`.

When accessed, the field value is computed lazily and stored in the bit field. Future calls will not have to call `hasAnnotation` anymore.

For compatibility, the implementation is split into two parts. The Memoizer interface specifies the API, and has default implementations that fall back to computing the default value each time. AbstractMemoizer actually implements the storage using fields. It is only used where necessary, i.e. in DefaultAnnotationMetadata and the EmptyAnnotationMetadata. To make sure the memoized values don't become inconsistent when DefaultAnnotationMetadata is modified, the mutation methods clear the memoization cache.

## Performance

Here are some JMH results:

```
Benchmark                       (annotated)  (type)  Mode  Cnt   Score   Error  Units
MemoBenchmark.direct                   true    Bare  avgt    5   5.691 ± 0.558  ns/op
MemoBenchmark.direct                   true  Extra1  avgt    5  10.485 ± 0.604  ns/op
MemoBenchmark.direct                   true  Extra2  avgt    5  10.907 ± 0.545  ns/op
MemoBenchmark.direct                  false    Bare  avgt    5   0.945 ± 0.078  ns/op
MemoBenchmark.direct                  false  Extra1  avgt    5  14.922 ± 1.386  ns/op
MemoBenchmark.direct                  false  Extra2  avgt    5  15.162 ± 1.475  ns/op
MemoBenchmark.memoized                 true    Bare  avgt    5   1.409 ± 0.027  ns/op
MemoBenchmark.memoized                 true  Extra1  avgt    5   1.443 ± 0.107  ns/op
MemoBenchmark.memoized                 true  Extra2  avgt    5   1.435 ± 0.112  ns/op
MemoBenchmark.memoized                false    Bare  avgt    5   1.432 ± 0.061  ns/op
MemoBenchmark.memoized                false  Extra1  avgt    5   1.398 ± 0.019  ns/op
MemoBenchmark.memoized                false  Extra2  avgt    5   1.351 ± 0.091  ns/op
MemoBenchmark.memoizedFallback         true    Bare  avgt    5   5.580 ± 0.070  ns/op
MemoBenchmark.memoizedFallback         true  Extra1  avgt    5  10.591 ± 0.170  ns/op
MemoBenchmark.memoizedFallback         true  Extra2  avgt    5  13.291 ± 0.253  ns/op
MemoBenchmark.memoizedFallback        false    Bare  avgt    5   1.115 ± 0.011  ns/op
MemoBenchmark.memoizedFallback        false  Extra1  avgt    5  13.521 ± 0.578  ns/op
MemoBenchmark.memoizedFallback        false  Extra2  avgt    5  14.191 ± 0.902  ns/op
```

The `direct` benchmark tests a direct call to `hasAnnotation`, the `memoized` benchmark the memoized version, and `memoizedFallback` the default non-cache implementation (i.e. using the Memoizer API but without AbstractMemoizer). The `annotated` parameter checks whether the field in question is annotated or not, and the `type` parameter compares versions without additional annotations, with one extra annotation, or with two extra annotations. Extra annotations affect the efficiency of the direct `hasAnnotation` call, since the backing Maps become larger.

In the results you can see that the fallback option (no caching) is only very slightly slower (~0.5ns) than the normal direct call. This gives us assurance that even in edge cases where no memoization is available, performance won't suffer when using the Memoizer API.

The `AbstractMemoization` implementation (with caching) takes a consistent <1.5ns, making it faster than the direct call to hasAnnotation in almost all cases – excepting the case where the field is not annotated at all.
@yawkat yawkat added this to the 4.10.0 milestone Aug 1, 2025
@yawkat yawkat added the type: improvement A minor improvement to an existing feature label Aug 1, 2025
Copy link
Contributor

@graemerocher graemerocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR would be easier to review and understand if it included examples of optimisations applied to the codebase. At the moment it is unclear how this will be used in practise.

*/
AnnotationMetadata EMPTY_METADATA = new EmptyAnnotationMetadata();

MemoizerNamespace<AnnotationMetadata> MEMOIZER_NAMESPACE = MemoizerNamespace.create();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable data structure in a static field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's mutable, that's why it's important to only create a limited number of MemoizedReferences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the concern is that users could fiddle with this mutable static fields and create weird bugs


static {
try {
ITEMS_FIELD = MethodHandles.lookup().findVarHandle(AbstractMemoizer.class, "items", Object[].class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work on Graal?

private MemoizedFlag() {
}

abstract boolean compute(M memoizer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc

@dstepanov
Copy link
Contributor

IMHO This is overkill. From the user prespective you don't know if you should be using this or calling methods on the annotation metadata.

What is the use-case we want to optimize? If it's a simple hasAnnotation or hasStereotype we can optimize it by caching last annotation checked like String lastHasAnnotationTrue, lastHasAnnotationFalse or try to add something like StringIntMap.

@yawkat yawkat marked this pull request as draft August 4, 2025 07:43
@yawkat yawkat marked this pull request as ready for review August 4, 2025 08:51
@yawkat
Copy link
Member Author

yawkat commented Aug 4, 2025

@dstepanov I don't believe other optimizations can work as well as this can. Even a very simple map takes some time for item access. Caching the last accessed annotation type might work for simple benchmarks, but validation doesn't only access a single annotation per metadata.

In comparison, the Memoizer will in the ideal case simply access a long field and do two bitwise comparisons. It is hard to beat.

A side benefit is that the memoizer can also be used to implement more complex caching such as this one. That commit wouldn't need the COW map, it can use MemoizedReference instead.

@yawkat
Copy link
Member Author

yawkat commented Aug 4, 2025

On further testing in validation, I had to modify the API of this PR. ( da3c1f9 )

It turns out that even the simple .getMemoized call can lead to an interface table lookup when the call site acts on multiple different types, even if all those types extend AbstractMemoizer. The new implementation uses an instanceof AbstractMemoizer instead, which does not involve the itable lookup. This does decrease performance in the MemoBenchmark a bit though:

Benchmark                       (annotated)  (type)  Mode  Cnt   Score   Error  Units
MemoBenchmark.direct                   true    Bare  avgt    5   5.636 ± 0.084  ns/op
MemoBenchmark.direct                   true  Extra1  avgt    5   9.666 ± 0.315  ns/op
MemoBenchmark.direct                   true  Extra2  avgt    5  10.034 ± 0.215  ns/op
MemoBenchmark.direct                  false    Bare  avgt    5   0.907 ± 0.040  ns/op
MemoBenchmark.direct                  false  Extra1  avgt    5  13.867 ± 0.575  ns/op
MemoBenchmark.direct                  false  Extra2  avgt    5  14.688 ± 0.243  ns/op
MemoBenchmark.memoized                 true    Bare  avgt    5   2.481 ± 0.067  ns/op
MemoBenchmark.memoized                 true  Extra1  avgt    5   2.477 ± 0.047  ns/op
MemoBenchmark.memoized                 true  Extra2  avgt    5   2.501 ± 0.118  ns/op
MemoBenchmark.memoized                false    Bare  avgt    5   2.697 ± 0.009  ns/op
MemoBenchmark.memoized                false  Extra1  avgt    5   2.711 ± 0.072  ns/op
MemoBenchmark.memoized                false  Extra2  avgt    5   2.723 ± 0.131  ns/op
MemoBenchmark.memoizedFallback         true    Bare  avgt    5   6.402 ± 0.189  ns/op
MemoBenchmark.memoizedFallback         true  Extra1  avgt    5  11.204 ± 0.645  ns/op
MemoBenchmark.memoizedFallback         true  Extra2  avgt    5  11.181 ± 0.638  ns/op
MemoBenchmark.memoizedFallback        false    Bare  avgt    5   1.128 ± 0.029  ns/op
MemoBenchmark.memoizedFallback        false  Extra1  avgt    5  14.541 ± 0.708  ns/op
MemoBenchmark.memoizedFallback        false  Extra2  avgt    5  15.458 ± 0.559  ns/op

yawkat added a commit to micronaut-projects/micronaut-validation that referenced this pull request Aug 4, 2025
Depends on micronaut-projects/micronaut-core#11970 . Saves a further ~40% of runtime in ParameterBenchmark
Copy link

sonarqubecloud bot commented Aug 6, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
1 New Blocker Issues (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@dstepanov
Copy link
Contributor

Personally I'm agains this changes:

  • It introduces a duplicate concept how to check for something. It's confusing why you would need to have something memorized and it's not performant by default
  • I find the whole memorizing API complicated: namespace, references, flags, fallback
  • Also I don't like that it includes a shared static state
  • Considering the AnnotationMetadata is a backbone of Micronaut I don't feel confident adding something like that there

This might be a good idea but it should be a separate concept.

@graemerocher
Copy link
Contributor

I think I am in agreement with @dstepanov in that the API is confusing and the shared static state is a problem that could lead to complicated to debug issues down the line. @dstepanov can you investigate if there is a way to achieve similar performance improvements with a simpler API?

@dstepanov
Copy link
Contributor

I think we might want to introduce a specific cache per method connected to the proxy. Instead of having ConcurrentHashMap in multiple interceptors we can have one cache (method->data) probably linked to the proxy.

The syntax would be something like methodInvocationContext.getCached(RepositoryInterceptor.Key, this::init)

Where RepositoryInterceptor.Key whould implement some key interface in the same way as this PR does.

This will allow us to cache runtime date for data repository, validation, configuration etc.

@yawkat yawkat removed this from 4.10.0 Release Oct 1, 2025
@yawkat yawkat modified the milestones: 4.10.0, 5.0.0 Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: improvement A minor improvement to an existing feature
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants