-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Introduce memoizer API for AnnotationMetadata #11970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.10.x
Are you sure you want to change the base?
Conversation
micronaut-validation makes extensive use of AnnotationMetadata APIs on the hot path, with these calls combined taking 200-300ns even for the simplest validation benchmarks. Some of these computations could be cached in simple Maps in micronaut-validation, but even the hasAnnotation calls add up. These calls are fairly well optimized in core, but still take ~10ns each, for Map access. This PR adds a new Memoizer API that can move the caching to an efficient data structure inside AnnotationMetadata itself. The API is inspired by FastThreadLocal. `metadata.hasAnnotation(MyAnnotation.class)` will be replaced by: ```java private static final MemoizedFlag<AnnotationMetadata> HAS_MY_ANNOTATION = AnnotationMetadata.MEMOIZER_NAMESPACE.newFlag(m -> m.hasAnnotation(MyAnnotation.class)); metadata.getMemoized(HAS_MY_ANNOTATION) ``` Internally, creating a `MemoizedFlag` reserves a "slot" in a bit field that stores the memoized value on each AnnotationMetadata. Creating too many MemoizedFlags can lead to higher memory use for *every* AnnotationMetadata, so MemoizedFlags should be created sparingly and should always be `static`. When accessed, the field value is computed lazily and stored in the bit field. Future calls will not have to call `hasAnnotation` anymore. For compatibility, the implementation is split into two parts. The Memoizer interface specifies the API, and has default implementations that fall back to computing the default value each time. AbstractMemoizer actually implements the storage using fields. It is only used where necessary, i.e. in DefaultAnnotationMetadata and the EmptyAnnotationMetadata. To make sure the memoized values don't become inconsistent when DefaultAnnotationMetadata is modified, the mutation methods clear the memoization cache. ## Performance Here are some JMH results: ``` Benchmark (annotated) (type) Mode Cnt Score Error Units MemoBenchmark.direct true Bare avgt 5 5.691 ± 0.558 ns/op MemoBenchmark.direct true Extra1 avgt 5 10.485 ± 0.604 ns/op MemoBenchmark.direct true Extra2 avgt 5 10.907 ± 0.545 ns/op MemoBenchmark.direct false Bare avgt 5 0.945 ± 0.078 ns/op MemoBenchmark.direct false Extra1 avgt 5 14.922 ± 1.386 ns/op MemoBenchmark.direct false Extra2 avgt 5 15.162 ± 1.475 ns/op MemoBenchmark.memoized true Bare avgt 5 1.409 ± 0.027 ns/op MemoBenchmark.memoized true Extra1 avgt 5 1.443 ± 0.107 ns/op MemoBenchmark.memoized true Extra2 avgt 5 1.435 ± 0.112 ns/op MemoBenchmark.memoized false Bare avgt 5 1.432 ± 0.061 ns/op MemoBenchmark.memoized false Extra1 avgt 5 1.398 ± 0.019 ns/op MemoBenchmark.memoized false Extra2 avgt 5 1.351 ± 0.091 ns/op MemoBenchmark.memoizedFallback true Bare avgt 5 5.580 ± 0.070 ns/op MemoBenchmark.memoizedFallback true Extra1 avgt 5 10.591 ± 0.170 ns/op MemoBenchmark.memoizedFallback true Extra2 avgt 5 13.291 ± 0.253 ns/op MemoBenchmark.memoizedFallback false Bare avgt 5 1.115 ± 0.011 ns/op MemoBenchmark.memoizedFallback false Extra1 avgt 5 13.521 ± 0.578 ns/op MemoBenchmark.memoizedFallback false Extra2 avgt 5 14.191 ± 0.902 ns/op ``` The `direct` benchmark tests a direct call to `hasAnnotation`, the `memoized` benchmark the memoized version, and `memoizedFallback` the default non-cache implementation (i.e. using the Memoizer API but without AbstractMemoizer). The `annotated` parameter checks whether the field in question is annotated or not, and the `type` parameter compares versions without additional annotations, with one extra annotation, or with two extra annotations. Extra annotations affect the efficiency of the direct `hasAnnotation` call, since the backing Maps become larger. In the results you can see that the fallback option (no caching) is only very slightly slower (~0.5ns) than the normal direct call. This gives us assurance that even in edge cases where no memoization is available, performance won't suffer when using the Memoizer API. The `AbstractMemoization` implementation (with caching) takes a consistent <1.5ns, making it faster than the direct call to hasAnnotation in almost all cases – excepting the case where the field is not annotated at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR would be easier to review and understand if it included examples of optimisations applied to the codebase. At the moment it is unclear how this will be used in practise.
*/ | ||
AnnotationMetadata EMPTY_METADATA = new EmptyAnnotationMetadata(); | ||
|
||
MemoizerNamespace<AnnotationMetadata> MEMOIZER_NAMESPACE = MemoizerNamespace.create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutable data structure in a static field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's mutable, that's why it's important to only create a limited number of MemoizedReference
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the concern is that users could fiddle with this mutable static fields and create weird bugs
|
||
static { | ||
try { | ||
ITEMS_FIELD = MethodHandles.lookup().findVarHandle(AbstractMemoizer.class, "items", Object[].class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work on Graal?
private MemoizedFlag() { | ||
} | ||
|
||
abstract boolean compute(M memoizer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc
IMHO This is overkill. From the user prespective you don't know if you should be using this or calling methods on the annotation metadata. What is the use-case we want to optimize? If it's a simple |
@dstepanov I don't believe other optimizations can work as well as this can. Even a very simple map takes some time for item access. Caching the last accessed annotation type might work for simple benchmarks, but validation doesn't only access a single annotation per metadata. In comparison, the Memoizer will in the ideal case simply access a long field and do two bitwise comparisons. It is hard to beat. A side benefit is that the memoizer can also be used to implement more complex caching such as this one. That commit wouldn't need the COW map, it can use MemoizedReference instead. |
On further testing in validation, I had to modify the API of this PR. ( da3c1f9 ) It turns out that even the simple
|
Depends on micronaut-projects/micronaut-core#11970 . Saves a further ~40% of runtime in ParameterBenchmark
|
Personally I'm agains this changes:
This might be a good idea but it should be a separate concept. |
I think I am in agreement with @dstepanov in that the API is confusing and the shared static state is a problem that could lead to complicated to debug issues down the line. @dstepanov can you investigate if there is a way to achieve similar performance improvements with a simpler API? |
I think we might want to introduce a specific cache per method connected to the proxy. Instead of having The syntax would be something like Where This will allow us to cache runtime date for data repository, validation, configuration etc. |
micronaut-validation makes extensive use of AnnotationMetadata APIs on the hot path, with these calls combined taking 200-300ns even for the simplest validation benchmarks. Some of these computations could be cached in simple Maps in micronaut-validation, but even the hasAnnotation calls add up. These calls are fairly well optimized in core, but still take ~10ns each, for Map access.
This PR adds a new Memoizer API that can move the caching to an efficient data structure inside AnnotationMetadata itself. The API is inspired by FastThreadLocal.
metadata.hasAnnotation(MyAnnotation.class)
will be replaced by:Internally, creating a
MemoizedFlag
reserves a "slot" in a bit field that stores the memoized value on each AnnotationMetadata. Creating too many MemoizedFlags can lead to higher memory use for every AnnotationMetadata, so MemoizedFlags should be created sparingly and should always bestatic
.When accessed, the field value is computed lazily and stored in the bit field. Future calls will not have to call
hasAnnotation
anymore.For compatibility, the implementation is split into two parts. The Memoizer interface specifies the API, and has default implementations that fall back to computing the default value each time. AbstractMemoizer actually implements the storage using fields. It is only used where necessary, i.e. in DefaultAnnotationMetadata and the EmptyAnnotationMetadata. To make sure the memoized values don't become inconsistent when DefaultAnnotationMetadata is modified, the mutation methods clear the memoization cache.
Performance
Here are some JMH results:
The
direct
benchmark tests a direct call tohasAnnotation
, thememoized
benchmark the memoized version, andmemoizedFallback
the default non-cache implementation (i.e. using the Memoizer API but without AbstractMemoizer). Theannotated
parameter checks whether the field in question is annotated or not, and thetype
parameter compares versions without additional annotations, with one extra annotation, or with two extra annotations. Extra annotations affect the efficiency of the directhasAnnotation
call, since the backing Maps become larger.In the results you can see that the fallback option (no caching) is only very slightly slower (~0.5ns) than the normal direct call. This gives us assurance that even in edge cases where no memoization is available, performance won't suffer when using the Memoizer API.
The
AbstractMemoization
implementation (with caching) takes a consistent <1.5ns, making it faster than the direct call to hasAnnotation in almost all cases – excepting the case where the field is not annotated at all.