-
Notifications
You must be signed in to change notification settings - Fork 1k
[Dynatrace] Add artificial zero percentile in Meter creation rather than a MeterFilter #4782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dynatrace] Add artificial zero percentile in Meter creation rather than a MeterFilter #4782
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, I think. You can rebase now with the changes to LTT. Are you planning to add some tests?
} | ||
|
||
public void addMeterId(Meter.Id id) { | ||
metersWithArtificialZeroPercentile.add(id.getName() + ".percentile"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a Set of Id
instead of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly moved the code around and didn't actually change much, but I guess we could use a set of Meter.Id
s instead. That would then allow you to explicitly turn on/off the export of the 0th percentile for one particular Meter.Id
. For example: meter.name
with Tags tag=value1
would export a 0th percentile, but meter.name
with tag=value2
would not. I am not sure if this is really that relevant - I assume you generally specify percentiles on the metric name level. I am happy to change it though, if you feel it's relevant.
Thanks for taking a look! I rebased on the latest branch and added tests. |
// For LongTaskTimer, the 0 percentile is not tracked as it doesn't clear the | ||
// "interpolatable line" threshold defined in DefaultLongTaskTimer#takeSnapshot(). | ||
// see shouldTrackPercentilesWhenDynatraceSummaryInstrumentsNotUsed for a test | ||
// that exports LongTaskTimer percentiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is referring to these three lines:
Lines 143 to 145 in e3329b8
List<Double> percentilesAboveInterpolatableLine = percentilesRequested.stream() | |
.filter(p -> p * (activeTasks.size() + 1) > activeTasks.size()) | |
.collect(Collectors.toList()); |
They drop the 0 percentile (
percentilesAboveInterpolatableLine
won't contain the 0
percentile anymore)
I'm not sure if this is a bug or working as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug a little deeper, and I think the problem lies in the fact that the 0% HistogramGauge is never registered.
- The LongTaskTimer is registered, with the explicit zero percentile.
- The histogram gauges get registered. To do so, an initial snapshot is created:
Line 100 in e3329b8
HistogramSnapshot initialSnapshot = meter.takeSnapshot(); - At this point, the number of active tasks is 0. The LongTaskTimer is currently in the process of being registered. Thus, the calculation for
percentilesAboveInterpolatableLine
looks as follows:
Lines 143 to 145 in e3329b8
List<Double> percentilesAboveInterpolatableLine = percentilesRequested.stream() .filter(p -> p * (activeTasks.size() + 1) > activeTasks.size()) .collect(Collectors.toList());
Since the number ofactiveTasks
is 0, the calculation in thefilter
effectively evaluates top * (0 + 1) > 0
, which is equivalent top > 0
. This, of course, is a condition that the 0 percentile (wherep==0
) explicitly does not fulfil. The 0 percentile is dropped here. - Because the 0 percentile was dropped in the initial
takeSnapshot
, it will not be registered as a HistogramGauge because thepercentileValues
call has only percentiles > 0:Line 103 in e3329b8
ValueAtPercentile[] valueAtPercentiles = initialSnapshot.percentileValues(); - Since the 0th percentile gauge is never registered, it is also never exported.
Later, when the app is running and there are active tasks, this is caught in the if
here:
Line 151 in e3329b8
if (!percentilesRequested.isEmpty() || !buckets.isEmpty()) { |
However, on the initial
takeSnapshot
, Lines 161 to 165 in e3329b8
List<Double> youngestToOldestDurations = StreamSupport | |
.stream(((Iterable<SampleImpl>) activeTasks::descendingIterator).spliterator(), false) | |
.sequential() | |
.map(task -> task.duration(TimeUnit.NANOSECONDS)) | |
.collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have at least one bug in this area of the code. See #3877. I'll take a closer look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to update here that I didn't get a chance to look into it more today due to some other work that needs to get done first. I'll come back to this once that's done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I have just very recently seen another example of what seems like a race condition caused by the LongTask ending while the snapshot is being produced. I think Dynatrace should be covered for now with the changes in #4780 (most users will use the new DynatraceLongTaskTimer), but I agree that this should be revisited at some point. Thanks for the heads up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll take some more time to look into the LTT issue, I think. But the changes in this pull request don't change the status quo, I think. So nothing is worse related to the bug. I think we're fine to merge these changes and follow-up on #3877 separately. Let me know if I'm missing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, there is no change in the status quo, it only fixes a bug where the 0 percent quantile wasn't added automatically when explicitly specifying percentiles.
Hi @shakuzen, |
@shakuzen @jonatan-ivanov Do you think we could revive this one and get it into 1.14? Anything to be done from my side to do so? |
Sorry about the lack of response. Taking a look now. |
Thank you for taking a look! |
resolves #4750
See the issue for more context. This is mostly a refactoring of where the percentiles are registered (in the MeterFilter vs. in the creation of Meters)
CC @shakuzen