Add metric to check the current number of VT's #6009

ceremo · 2025-03-11T15:47:25Z

Related to #5950

Signed-off-by: César Revert <[email protected]>

shakuzen

Thanks for the pull request.

...er-java21/src/main/java/io/micrometer/java21/instrument/binder/jdk/VirtualThreadMetrics.java

shakuzen · 2025-03-12T00:53:41Z

...er-java21/src/main/java/io/micrometer/java21/instrument/binder/jdk/VirtualThreadMetrics.java

+            this.recordingStream.onEvent(END_EVENT, event -> activeCounter.decrement());
+
+            Gauge.builder(VT_ACTIVE_METRIC_NAME, activeCounter::doubleValue)
+                .description("The number of active virtual threads")


Rather than a gauge, I wonder if it might be more insightful to have two counters, one for started, one for ended. The active could be derived by taking the difference. And this would allow tracking the rate of virtual threads being started/stopped. Thoughts? Maybe in some use cases we would have to worry about overflow (assuming cumulative counters)?

It would be better to have both counters. However, these types of events (start/end) are numerous, so they would increase rapidly, and overflow could become a common scenario.

@ceremo @shakuzen IMO, if we have the ability to track rate of virtual threads being started along with active threads count, we could have an option to autoscale JVM apps based on rate

I think for autoscaling, the important metric is whether the active (virtual) thread count is continually increasing. If you just look at the rate of threads starting without looking at the rate of threads finishing, I'm not sure it tells much that is important for autoscaling because if you have a high rate of starting threads but finishing is keeping up, there is no problem and no autoscaling needed. Given that, maybe the rate of starting or stopping on its own isn't that interesting and the gauge of active virtual threads is most useful.

Yes @shakuzen . The above makes sense. But lets say we have active virtual threads count. Based on the traffic, we'll have to finetune our thresholds accordingly for active virtual threads

Whereas if we calculate % of virtual threads being used like (active threads/total threads created)*100, we can do 1 time setup where if threshold is greater than 60% (An example), autoscaling should kick in, which generally means active threads are increased (Rate of Finishing threads has slowed down)

Based on the traffic, we'll have to finetune our thresholds accordingly for active virtual threads

I don't think so. You might want to check for an absolute threshold of active virtual threads that is cause for concern, and you may want to check over multiple step intervals that the active virtual thread count is increasing significantly.

Whereas if we calculate % of virtual threads being used like (active threads/total threads created)*100, we can do 1 time setup where if threshold is greater than 60%

Say normally you have between 10-100 active threads, but when there is an issue it climbs to 300, 500, 2000, etc active threads. If your application just started and there is some issue, checking the ratio of active threads to total created may catch the issue. But if the issue happens long after the application started, the ratio no longer catches it because the total threads created is monotonic; it increases over the lifetime of the application. For this reason, using such a ratio is not a good way to catch irregularities.

Thanks @shakuzen . One final doubt. Apologies for dragging the conversation

But if the issue happens long after the application started, the ratio no longer catches it because the total threads created is monotonic; it increases over the lifetime of the application. For this reason, using such a ratio is not a good way to catch irregularities.

Since virtual threads created will be a counter, won't the below expression help for us?

sum(active_thread_count)/sum(rate(virtual_threads_created[1m]))

Because taking rate of a counter will give the number of threads created over a period of interval right(1m)? Even if application runs for days/months? (Assuming active_thread_count is a gauge)

Signed-off-by: César Revert <[email protected]>

Indresh2410 · 2025-03-20T07:08:01Z

@shakuzen / @ceremo any help on confirmation if we are going with two metrics for creation and deletion ? Having them will be really helpful

jonatan-ivanov · 2025-03-21T22:51:43Z

I was wondering if we should make RecordingConfig public and if we do that, will these additional config + enable/disable parameters get out of hand over time. I came up with a few ideas:

Abstract JfrMeterBinder
I think this is something we could do regardless of this PR to support JFR in general. We could move the common logic of RecordingStream handling into a generic class and implementations can enable/disable what they want and register their meters and event handlers, see this draft: main...jonatan-ivanov:micrometer:jfr-binder. This also means that we would make RecordingStream public on our public API surface.
Generic JfrMeterBinder
Similarly to the abstract class, we can let the users inject their own "registrar" (BiConsumer<MeterRegistry, RecordingStream>) and "configurer" (Consumer<RecordingStream>). See the previous draft: main...jonatan-ivanov:micrometer:jfr-binder but instead of the method definitions we would have injectable functional interfaces.
In addition to either #1 or #2 or none of them, we could have an enum for these events and users can define in the ctor which ones should be enabled (or disabled) (something similar to IgnoredMeters in DefaultMeterObservationHandler).

What do you think?

ceremo · 2025-03-24T09:15:38Z

I was wondering if we should make RecordingConfig public and if we do that, will these additional config + enable/disable parameters get out of hand over time. I came up with a few ideas:
...
What do you think?

The way I see it, if we want to create an abstract solution that's not only useful for listening to Virtual Threads JFR events, we should allow reusing the RecordingStream in the different implementations of the JfrMeterBinder, having a different RecordingStream for each type sounds not optimal. Related to this, the abstract class should not call the start of the RecordingStream, if this is shared, and executes more than once, it will fail:

Recording can only be started once.
java.lang.IllegalStateException: Recording can only be started once.

shakuzen · 2025-03-26T07:10:08Z

having a different RecordingStream for each type sounds not optimal.

Is it costly to have multiple RecordingStream instances? I wondered about the same thing in the original PR adding the virtual thread metrics but I didn't look into it to get to an answer. While being more efficient is a goal, the benefits from the isolation of not sharing it (which may introduce the ability of one binder to cause issues for another binder, depending on how things are exposed) feel more important if the only efficiency is allocating several bytes for one more RecordingStream object per binder. Presumably different binders will be listening to different JFR events anyway.

As for the "enabled" booleans, I don't think we need these for the existing metrics - those should have near negligible overhead and users don't need to bind this MeterBinder if they don't want that. The reason why it is important to separately be able to decide about enabling the start/stop virtual thread metrics is that the overhead can be more significant with high throughput using virtual threads. We have MeterFilter which lets users disable metrics they don't want, but there may be instrumentation overhead involved even if the metric is disabled. We could avoid calling RecordingStream#onEvent if the meter is disabled (a no-op). There may be overhead in enabling the event as well. If we could lazily enable the event only when the meter that needs it is not a no-op, then I think we can eliminate the overhead via a MeterFilter without needing an explicit "enabled" boolean. But it would be better for instrumentation with a potentially significant overhead to be opt-in separate from the rest of the instrumentation, so maybe an "enabled" boolean is still the best for this, unless we want to make a separate binder for instrumenting virtual threads start/stop.

ceremo · 2025-03-26T12:14:55Z

Is it costly to have multiple RecordingStream instances? I wondered about the same thing in the original PR adding the virtual thread metrics but I didn't look into it to get to an answer. While being more efficient is a goal, the benefits from the isolation of not sharing it (which may introduce the ability of one binder to cause issues for another binder, depending on how things are exposed) feel more important if the only efficiency is allocating several bytes for one more RecordingStream object per binder. Presumably different binders will be listening to different JFR events anyway.

Each RecordingStream requires its own memory allocation and appears to need at least 3 threads ("JFR Periodic Tasks",
"JFR Event Stream" and "JFR Recorder Thread") to process the events. What I'm suggesting is to support both scenarios:

the default one, RecordingStream created internally for each binder, not reusable
the custom one, allowing to set a RecordingStream, enabling reuse across different binders

shakuzen · 2025-03-27T08:27:16Z

There are some valid points of feedback being discussed, but it feels like we're getting farther from delivering on the original feature request rather than closer. That may be for the better in the long-term, but I want to lay out the options to proceed as I see them. We can 1) deliver the original feature (an additional opt-in meter for current number of virtual threads) without delivering another feature (generic JFR event-based meter binder abstraction). However, if we intend to introduce the generic JFR MeterBinder later and apply it to VirtualThreadMetrics, we need to keep that in mind when considering API changes now. Alternatively, we can 2) put this new meter feature on hold while we figure out the generic JFR MeterBinder. Or 3) try to tackle both features at once. There is also the release schedule to consider. A feature like generic JFR MeterBinder probably would be better to deliver in a milestone so there is more opportunity for feedback - right now our next feature release is a Release Candidate. So, delivering a generic JFR MeterBinder probably is best to delay until Micrometer 1.16.

I think my preference is for 2, and if we do that we should focus in a separate PR on reviewing what the API for a generic JFR MeterBinder looks like before deciding how this PR ends up - unless we can deliver this PR in a way that would not cause trouble for adapting to the generic JFR MeterBinder later in a backward compatible way.

@ceremo what are your thoughts? Are you okay putting this on hold or would you like to try to get this PR merged in time for 1.15 (without tackling a generic JFR MeterBinder feature)?

ceremo · 2025-03-27T08:41:44Z

@ceremo what are your thoughts? Are you okay putting this on hold or would you like to try to get this PR merged in time for 1.15 (without tackling a generic JFR MeterBinder feature)?

No problem keeping this on hold. Defining a new API for a generic JFR MeterBinder makes sense, so I would also choose the second alternative.

shakuzen · 2025-03-27T08:57:25Z

I've marked this as blocked and opened #6065, targeted at 1.16.0-M1.

shakuzen · 2025-04-14T00:42:43Z

Superseded by #6104, which is based on a new MBean in Java 24 rather than JFR events.

Add metric to check the current number of VT's

2224706

Signed-off-by: César Revert <[email protected]>

ceremo force-pushed the main branch from d14ea45 to 2224706 Compare March 11, 2025 15:51

Fix format

74060e0

Signed-off-by: César Revert <[email protected]>

ceremo force-pushed the main branch from d21fa87 to 74060e0 Compare March 11, 2025 15:57

ceremo marked this pull request as ready for review March 11, 2025 16:08

shakuzen reviewed Mar 12, 2025

View reviewed changes

ceremo added 2 commits March 12, 2025 13:13

Metrics enabled according to the recording configuration

938643a

Signed-off-by: César Revert <[email protected]>

Fix format

c18c2a7

Signed-off-by: César Revert <[email protected]>

shakuzen added the blocked An issue that's blocked on an external project change label Mar 27, 2025

shakuzen mentioned this pull request Mar 27, 2025

Generic/abstract JFR MeterBinder #6065

Open

shakuzen closed this Apr 14, 2025

jonatan-ivanov added superseded An issue that has been superseded by another and removed blocked An issue that's blocked on an external project change labels Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metric to check the current number of VT's #6009

Add metric to check the current number of VT's #6009

ceremo commented Mar 11, 2025

shakuzen left a comment

shakuzen Mar 12, 2025

ceremo Mar 12, 2025

Indresh2410 Mar 14, 2025 •

edited

Loading

shakuzen Mar 24, 2025

Indresh2410 Mar 24, 2025

shakuzen Mar 26, 2025

Indresh2410 Mar 26, 2025

Indresh2410 commented Mar 20, 2025

jonatan-ivanov commented Mar 21, 2025

ceremo commented Mar 24, 2025

shakuzen commented Mar 26, 2025

ceremo commented Mar 26, 2025

shakuzen commented Mar 27, 2025

ceremo commented Mar 27, 2025 •

edited

Loading

shakuzen commented Mar 27, 2025

shakuzen commented Apr 14, 2025

Add metric to check the current number of VT's #6009

Add metric to check the current number of VT's #6009

Conversation

ceremo commented Mar 11, 2025

shakuzen left a comment

Choose a reason for hiding this comment

shakuzen Mar 12, 2025

Choose a reason for hiding this comment

ceremo Mar 12, 2025

Choose a reason for hiding this comment

Indresh2410 Mar 14, 2025 • edited Loading

Choose a reason for hiding this comment

shakuzen Mar 24, 2025

Choose a reason for hiding this comment

Indresh2410 Mar 24, 2025

Choose a reason for hiding this comment

shakuzen Mar 26, 2025

Choose a reason for hiding this comment

Indresh2410 Mar 26, 2025

Choose a reason for hiding this comment

Indresh2410 commented Mar 20, 2025

jonatan-ivanov commented Mar 21, 2025

ceremo commented Mar 24, 2025

shakuzen commented Mar 26, 2025

ceremo commented Mar 26, 2025

shakuzen commented Mar 27, 2025

ceremo commented Mar 27, 2025 • edited Loading

shakuzen commented Mar 27, 2025

shakuzen commented Apr 14, 2025

Indresh2410 Mar 14, 2025 •

edited

Loading

ceremo commented Mar 27, 2025 •

edited

Loading