Skip to content

Conversation

@tduncan
Copy link

@tduncan tduncan commented Jun 6, 2025

First (rough) draft for defining how agents should profile traces selected for snapshotting. I'm very unfamiliar with writing this type of documentation so any help steering it where it needs to go is greatly appreciated!

@tduncan tduncan requested review from a team as code owners June 6, 2025 22:55
When a trace is profiled agents MUST add the span attribute `splunk.snapshot.profiling`
with a value of `true` to the entry span.

Agents SHOULD take an initial stack trace sample when starting to profile a trace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed before the initial sample will not contain user code. The stack trace is likely to be identical for all requests. What use does it have?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case with Node.js, these stack traces will never be correlated to the given trace ID, as we're only collecting stacktraces that are sampled during an active span.


When a language runtime supports threading, stacks MUST be sampled only for
trace ids selected for snapshotting. The samples for profiled threads SHOULD be
taken instantaneously and MAY be taken at separate times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The samples for profiled threads SHOULD be taken instantaneously and MAY be taken at separate times.

I don't follow the meaning of this sentence. Also taking a stack trace is not an instantaneous operation by any means. In java for example stack trace is taken at a safepoint, which means that all threads are suspended (apparently recent vms are able to suspend only a single thread, but I don't know whether it is used when taking a stack trace).

It is RECOMMENDED to export stack traces in batches to take advantage of the pprof
data format.

Agents SHOULD attempt to export any remaining stack traces during the Agent shutdown phase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this requirement makes sense, idk whether it can be easily implemented for all languages.

Comment on lines +117 to +119
The logs containing profiling data MUST be sent via OTLP. Instrumentation
libraries SHOULD reuse persistent OTLP connections from other signals (traces,
metrics).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is copied from https://github.com/signalfx/gdi-specification/blob/main/specification/behaviors.md#call-stack-ingest wanted to point out that I suspect that this is not true for the java implementation.


**Status**: [Experimental](../README.md#versioning-and-status-of-the-specification)

Unless stated otherwise Agents MUST follow the `Profiling `ResourceLogs` Message`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profiling ResourceLogs Message

is this a typo?

Comment on lines +290 to +291
The span attribute `splunk.snapshot.profiling` with a value of `true` indicates that
a trace within a service has been profiled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe point out that the attribute should be se on the local root span?

| `SPLUNK_PROFILER_MEMORY_ENABLED` | false | Whether memory profiling is enabled. [2] [6] |
| `SPLUNK_REALM` | `none` | Which realm to send exported data. [3] |
| `SPLUNK_TRACE_RESPONSE_HEADER_ENABLED` | true | Whether `Server-Timing` header is added to HTTP responses. [4] |
| `SPLUNK_SNAPSHOT_PROFILER_ENABLED` | false | Whether Trace Snapshot CPU profiling is enabled. [2] [5] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be covered/superseded by #353 .

Comment on lines +287 to +288
| `splunk.snapshot.profiler.enabled` | string | Enable or Disable trace snapshot profiling | `true` or `false` | `false` |
| `splunk.snapshot.profiler.sampling.interval` | string | Interval in which to take trace stack trace samples | Any valid duration `string` | `10ms` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded in #353.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants