-
Notifications
You must be signed in to change notification settings - Fork 103
Add support for timestamp encoding in exemplars #276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
52866fd
to
0486185
Compare
Clippy complaints are from Rust v1.89 (they should probably be addressed separately): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the help.
In favor. Just one discussion point.
Missing changelog entry.
Agreed with clippy fixes in a separate pull request? Would you mind helping out here?
src/metrics/exemplar.rs
Outdated
inner.exemplar = label_set.map(|label_set| Exemplar { | ||
label_set, | ||
value: v.clone(), | ||
time: SystemTime::now(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is SystemTime::now
?
My intuition is, that it is orders of magnitude larger than a simple Prometheus Counter atomic increase.
If so, is that performance impact intuitive for users? Is it worth it for users not using exemplar timestamps?
If not, what do you think of a mechanism that makes these optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me do some benchmarking to have some numbers.
We can hide this behind a feature if it's too expensive, if you're okay with that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added benchmarks and the following to the commit message. Let me know if you would prefer to have this to be feature gated still.
Benchmarking this against the baseline with no changes:
- AMD EPYC 7642:
histogram without exemplars
time: [12.389 ns 12.390 ns 12.392 ns]
change: [-0.0820% -0.0532% -0.0227%] (p = 0.00 < 0.05)
Change within noise threshold.
histogram with exemplars (no exemplar passed)
time: [28.469 ns 28.476 ns 28.483 ns]
change: [+1.9145% +1.9533% +1.9954%] (p = 0.00 < 0.05)
Performance has regressed.
histogram with exemplars (some exemplar passed)
time: [135.70 ns 135.83 ns 135.96 ns]
change: [+49.325% +49.740% +50.112%] (p = 0.00 < 0.05)
Performance has regressed.
- Apple M3 Pro:
histogram without exemplars
time: [3.1357 ns 3.1617 ns 3.1974 ns]
change: [+1.2045% +2.0927% +3.1167%] (p = 0.00 < 0.05)
Performance has regressed.
histogram with exemplars (no exemplar passed)
time: [5.8648 ns 5.8751 ns 5.8872 ns]
change: [+0.1479% +0.9875% +1.6873%] (p = 0.01 < 0.05)
Change within noise threshold.
histogram with exemplars (some exemplar passed)
time: [69.448 ns 69.790 ns 70.192 ns]
change: [+24.346% +24.897% +25.459%] (p = 0.00 < 0.05)
Performance has regressed.
The only real change would come in the third benchmark when exemplars are actually passed, changes in the other two are due to noise.
Exemplars are usually used for tracing, where there's sampling involved, so not every observation would incur a performance penalty, and only a small fraction would be affected. For cases where tracing is enabled for 100% of observations, the overhead if tracing itself would drown any changes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing changelog entry.
I'll add it.
Agreed with clippy fixes in a separate pull request? Would you mind helping out here?
See #277
src/metrics/exemplar.rs
Outdated
inner.exemplar = label_set.map(|label_set| Exemplar { | ||
label_set, | ||
value: v.clone(), | ||
time: SystemTime::now(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me do some benchmarking to have some numbers.
We can hide this behind a feature if it's too expensive, if you're okay with that approach.
0486185
to
a769bf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 thanks for the benchmark. Very helpful.
As an aside, I am surprised that histogram with exemplars (no exemplar passed)
doesn't equal histogram without exemplars
. Is the formatting different?
Instead of a feature flag, how about changing the CounterWithExemplar
and HistogramWithExemplar
methods slightly?
- pub fn observe(&self, v: f64, label_set: Option<S>) {
+ pub fn observe(&self, v: f64, label_set: Option<S>, timestamp: Option<SystemTime>) {
Benefits:
- Users that don't need the timestamp don't pay for it. Given that it is explicit in the signature, there are no surprises.
- Users that already have a current
SystemTime
around, can re-use it.
Thoughts?
a769bf5
to
7951bc8
Compare
When you observe a measurement for client_rust/src/metrics/exemplar.rs Lines 214 to 215 in 1d5824a
You can separate exemplar locking from the histogram locking, but it doesn't become much faster. Interestingly, it speeds up quite a bit if you insert a false panic that never triggers: pub fn observe(&self, v: f64, label_set: Option<S>) {
let bucket = self.inner.histogram.observe_and_bucket(v);
if let (Some(bucket), Some(label_set)) = (bucket, label_set) {
// if true {
// panic!("wtf");
// }
self.inner.exemplars.write().insert(
bucket,
Exemplar {
label_set,
value: v,
time: SystemTime::now(),
},
);
// self.observe_exemplar(bucket, v, label_set);
}
}
One needs to look at assembly to figure out what drives this difference. That's on Apple M3 Pro. On AMD it is faster even without the fake panic and the panic doesn't move the needle further:
It's definitely a candidate for another PR.
It's certainly an option, but given that it's the public interface it would be a breaking change. Adding timestamps implicitly avoids that. Allowing to pass a timestamp is nice when you have multiple histograms to update around the same time. There is some cost to it:
I updated the code. |
3937878
to
0e1bb0e
Compare
src/metrics/exemplar.rs
Outdated
pub struct Exemplar<S, V> { | ||
pub(crate) label_set: S, | ||
pub(crate) value: V, | ||
pub(crate) time: SystemTime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. I suggest the following:
pub(crate) time: SystemTime, | |
pub(crate) time: Option<SystemTime>, |
Then in CounterWithExemplar::inc_by
:
pub fn inc_by(&self, v: N, label_set: Option<S>, timestamp: Option<SystemTime>) -> N {
And HistogramWithExemplar
:
pub fn observe(&self, v: f64, label_set: Option<S>, timestamp: Option<SystemTime>) {
let mut inner = self.inner.write();
let bucket = inner.histogram.observe_and_bucket(v);
if let (Some(bucket), Some(label_set)) = (bucket, label_set) {
inner.exemplars.insert(
bucket,
Exemplar {
label_set,
value: v,
- time: timestamp.unwrap_or_else(SystemTime::now),
+ time: timestamp,
},
);
}
That way users that need a timestamp can easily enable it, while users that don't don't pay for it.
Am I missing something?
Regarding breaking change, I think that is fine. Next version is a breaking change anyways. I doubt many folks use the exemplar API. I think the signature change is self-explanatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I updated the code.
CHANGELOG.md
Outdated
- Exemplar timestamps, which are required for `convert_classic_histograms_to_nhcb: true` | ||
in Prometheus scraping. See [PR 276]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case we decide for the breaking change, better included under ### Changed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it there.
🙏 |
920562d
to
3a9225b
Compare
Signed-off-by: Ivan Babrou <[email protected]>
While timestamps are optional, they are required for native histograms: * https://github.com/prometheus/prometheus/blame/4aee718013/scrape/scrape.go#L1833 Old histograms can be converted to native histograms at the ingestion time, which in turn makes timestamps required for old histograms too. Benchmarking this against the baseline with no changes: * AMD EPYC 7642: ``` histogram without exemplars time: [12.389 ns 12.390 ns 12.392 ns] change: [-0.0820% -0.0532% -0.0227%] (p = 0.00 < 0.05) Change within noise threshold. histogram with exemplars (no exemplar passed) time: [28.469 ns 28.476 ns 28.483 ns] change: [+1.9145% +1.9533% +1.9954%] (p = 0.00 < 0.05) Performance has regressed. histogram with exemplars (some exemplar passed) time: [135.70 ns 135.83 ns 135.96 ns] change: [+49.325% +49.740% +50.112%] (p = 0.00 < 0.05) Performance has regressed. ``` * Apple M3 Pro: ``` histogram without exemplars time: [3.1357 ns 3.1617 ns 3.1974 ns] change: [+1.2045% +2.0927% +3.1167%] (p = 0.00 < 0.05) Performance has regressed. histogram with exemplars (no exemplar passed) time: [5.8648 ns 5.8751 ns 5.8872 ns] change: [+0.1479% +0.9875% +1.6873%] (p = 0.01 < 0.05) Change within noise threshold. histogram with exemplars (some exemplar passed) time: [69.448 ns 69.790 ns 70.192 ns] change: [+24.346% +24.897% +25.459%] (p = 0.00 < 0.05) Performance has regressed. ``` The only real change would come in the third benchmark when exemplars are actually passed, changes in the other two are due to noise. Exemplars are usually used for tracing, where there's sampling involved, so not every observation would incur a performance penalty, and only a small fraction would be affected. For cases where tracing is enabled for 100% of observations, the overhead if tracing itself would drown any changes here. Signed-off-by: Ivan Babrou <[email protected]>
3a9225b
to
b698bfa
Compare
This allows having exemplars without timestamps, which are cheaper. It also allows amortization of timestamping if multiple measurements share time. Signed-off-by: Ivan Babrou <[email protected]>
b698bfa
to
9435495
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bearing with me!
@mxinden, when do you expect to have a release? |
Signed-off-by: Ivan Babrou <[email protected]> Signed-off-by: Max Leonard Inden <[email protected]>
While timestamps are optional, they are required for native histograms:
Old histograms can be converted to native histograms at the ingestion time, which in turn makes timestamps required for old histograms too.