Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions specification/behaviors.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,69 @@

See [integration_context.md](integration_context.md) for specifics about
exchanging additional context between AppD and splunk-otel based agents.

## Trace Snapshot Profiling

**Status**: [Experimental](../README.md#versioning-and-status-of-the-specification)

This section describes the behavior for Splunk instrumentation libraries

Check failure on line 59 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Trailing spaces

specification/behaviors.md:59:73 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md
that contain trace snapshot profiling features.

### Instrumentation Source

Agents MUST specify set the `profiling.instrumentation.source` value to `snapshot`

### Starting Trace Profiler

The OpenTelemetry Baggage entry for `splunk.trace.snapshot.volume` MUST be used to

Check failure on line 68 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Line length

specification/behaviors.md:68:81 MD013/line-length Line length [Expected: 80; Actual: 83] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md013.md

Check failure on line 68 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Trailing spaces

specification/behaviors.md:68:83 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md
decide whether to profile a trace. A value of `higest` is the signal to begin

Check failure on line 69 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Trailing spaces

specification/behaviors.md:69:78 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md
profiling where as a value of `off` is an explicit signal to not profile.

When profiling a trace, the profiler SHOULD be started when an entry span is

Check failure on line 72 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Trailing spaces

specification/behaviors.md:72:77 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md
detected. An entry span is defined as either the root span of the trace or

Check failure on line 73 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Trailing spaces

specification/behaviors.md:73:75 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md
any other span within a trace whose parent span is remote.

When a trace is profiled agents MUST add the span attribute `splunk.snapshot.profiling`

Check failure on line 76 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Line length

specification/behaviors.md:76:81 MD013/line-length Line length [Expected: 80; Actual: 88] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md013.md

Check failure on line 76 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Trailing spaces

specification/behaviors.md:76:88 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md
with a value of `true` to the entry span.

Agents SHOULD take an initial stack trace sample when starting to profile a trace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed before the initial sample will not contain user code. The stack trace is likely to be identical for all requests. What use does it have?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case with Node.js, these stack traces will never be correlated to the given trace ID, as we're only collecting stacktraces that are sampled during an active span.


### Trace Profiling

Check failure on line 81 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Headings should be surrounded by blank lines

specification/behaviors.md:81 MD022/blanks-around-headings Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] [Context: "### Trace Profiling"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md022.md
An instrumentation library that has trace snapshot profiling capabilities MUST
be able to sample call stacks for specific trace ids at a fixed interval.

When a language runtime supports threading, stacks MUST be sampled only for

Check failure on line 85 in specification/behaviors.md

View workflow job for this annotation

GitHub Actions / validate-documentation

Trailing spaces

specification/behaviors.md:85:76 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md009.md
trace ids selected for snapshotting. The samples for profiled threads SHOULD be
taken instantaneously and MAY be taken at separate times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The samples for profiled threads SHOULD be taken instantaneously and MAY be taken at separate times.

I don't follow the meaning of this sentence. Also taking a stack trace is not an instantaneous operation by any means. In java for example stack trace is taken at a safepoint, which means that all threads are suspended (apparently recent vms are able to suspend only a single thread, but I don't know whether it is used when taking a stack trace).


Agents MUST sample threads associated with the entry span for the duration of
the span's life.

### Call Stack Span Association

Agents SHOULD keep track of the current span for each profiled thread. Agents
are RECOMMENDED to use the OpenTelemetry Context for determining when the current
span changes.

When available, agents MUST use the span id from the profiled thread's current span
as the span id.

### Stopping Trace Profiler
Trace profiling MUST be stopped when the entry span of a service ends.

Agents SHOULD take a final stack trace sample when stopping profiling
for a trace.

### Exporting Stack Traces
It is RECOMMENDED to export stack traces in batches to take advantage of the pprof
data format.

Agents SHOULD attempt to export any remaining stack traces during the Agent shutdown phase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this requirement makes sense, idk whether it can be easily implemented for all languages.


### Call Stack Ingest

Call stacks MUST be ingested as [OpenTelemetry
Logs](https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/logs).
The logs containing profiling data MUST be sent via OTLP. Instrumentation
libraries SHOULD reuse persistent OTLP connections from other signals (traces,
metrics).
Comment on lines +117 to +119
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is copied from https://github.com/signalfx/gdi-specification/blob/main/specification/behaviors.md#call-stack-ingest wanted to point out that I suspect that this is not true for the java implementation.

1 change: 1 addition & 0 deletions specification/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ instance using the following environment variables:
| `SPLUNK_PROFILER_MEMORY_ENABLED` | false | Whether memory profiling is enabled. [2] [6] |
| `SPLUNK_REALM` | `none` | Which realm to send exported data. [3] |
| `SPLUNK_TRACE_RESPONSE_HEADER_ENABLED` | true | Whether `Server-Timing` header is added to HTTP responses. [4] |
| `SPLUNK_SNAPSHOT_PROFILER_ENABLED` | false | Whether Trace Snapshot CPU profiling is enabled. [2] [5] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be covered/superseded by #353 .


- [1]: Not user required if another system performs the authentication. For
example, instrumentation libraries SHOULD send data to a locally running
Expand Down
17 changes: 17 additions & 0 deletions specification/semantic_conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,3 +272,20 @@ For each `cpu` sample:
in milliseconds if this sample represents a periodic event
- label `thread.state` of type `string` OPTIONALLY can be set to describe
the state of the thread

## Trace Snapshot Profiling

**Status**: [Experimental](../README.md#versioning-and-status-of-the-specification)

Unless stated otherwise Agents MUST follow the `Profiling `ResourceLogs` Message`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profiling ResourceLogs Message

is this a typo?

semantic conventions.

Trace Snapshot Profiling Configuration Options

| Name | Type | Description | Valid Values | Default |
|----------------------------------------------|--------|-----------------------------------------------------|------------------------------| ------- |
| `splunk.snapshot.profiler.enabled` | string | Enable or Disable trace snapshot profiling | `true` or `false` | `false` |
| `splunk.snapshot.profiler.sampling.interval` | string | Interval in which to take trace stack trace samples | Any valid duration `string` | `10ms` |
Comment on lines +287 to +288
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded in #353.


The span attribute `splunk.snapshot.profiling` with a value of `true` indicates that
a trace within a service has been profiled.
Comment on lines +290 to +291
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe point out that the attribute should be se on the local root span?