-
Notifications
You must be signed in to change notification settings - Fork 16
[DRAFT] Describe Trace Snapshot Profiling #343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| When a trace is profiled agents MUST add the span attribute `splunk.snapshot.profiling` | ||
| with a value of `true` to the entry span. | ||
|
|
||
| Agents SHOULD take an initial stack trace sample when starting to profile a trace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed before the initial sample will not contain user code. The stack trace is likely to be identical for all requests. What use does it have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case with Node.js, these stack traces will never be correlated to the given trace ID, as we're only collecting stacktraces that are sampled during an active span.
|
|
||
| When a language runtime supports threading, stacks MUST be sampled only for | ||
| trace ids selected for snapshotting. The samples for profiled threads SHOULD be | ||
| taken instantaneously and MAY be taken at separate times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The samples for profiled threads SHOULD be taken instantaneously and MAY be taken at separate times.
I don't follow the meaning of this sentence. Also taking a stack trace is not an instantaneous operation by any means. In java for example stack trace is taken at a safepoint, which means that all threads are suspended (apparently recent vms are able to suspend only a single thread, but I don't know whether it is used when taking a stack trace).
| It is RECOMMENDED to export stack traces in batches to take advantage of the pprof | ||
| data format. | ||
|
|
||
| Agents SHOULD attempt to export any remaining stack traces during the Agent shutdown phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether this requirement makes sense, idk whether it can be easily implemented for all languages.
| The logs containing profiling data MUST be sent via OTLP. Instrumentation | ||
| libraries SHOULD reuse persistent OTLP connections from other signals (traces, | ||
| metrics). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it is copied from https://github.com/signalfx/gdi-specification/blob/main/specification/behaviors.md#call-stack-ingest wanted to point out that I suspect that this is not true for the java implementation.
|
|
||
| **Status**: [Experimental](../README.md#versioning-and-status-of-the-specification) | ||
|
|
||
| Unless stated otherwise Agents MUST follow the `Profiling `ResourceLogs` Message` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProfilingResourceLogsMessage
is this a typo?
| The span attribute `splunk.snapshot.profiling` with a value of `true` indicates that | ||
| a trace within a service has been profiled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe point out that the attribute should be se on the local root span?
Co-authored-by: Lauri Tulmin <[email protected]>
| | `SPLUNK_PROFILER_MEMORY_ENABLED` | false | Whether memory profiling is enabled. [2] [6] | | ||
| | `SPLUNK_REALM` | `none` | Which realm to send exported data. [3] | | ||
| | `SPLUNK_TRACE_RESPONSE_HEADER_ENABLED` | true | Whether `Server-Timing` header is added to HTTP responses. [4] | | ||
| | `SPLUNK_SNAPSHOT_PROFILER_ENABLED` | false | Whether Trace Snapshot CPU profiling is enabled. [2] [5] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be covered/superseded by #353 .
| | `splunk.snapshot.profiler.enabled` | string | Enable or Disable trace snapshot profiling | `true` or `false` | `false` | | ||
| | `splunk.snapshot.profiler.sampling.interval` | string | Interval in which to take trace stack trace samples | Any valid duration `string` | `10ms` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superseded in #353.
First (rough) draft for defining how agents should profile traces selected for snapshotting. I'm very unfamiliar with writing this type of documentation so any help steering it where it needs to go is greatly appreciated!