Skip to content

Conversation

@tduncan
Copy link

@tduncan tduncan commented Feb 19, 2025

The profiling backend ingestion process needs to be able to distinguish between callstacks collected by the existing continuous "Always On" profiler and those collected by the upcoming trace snapshot profiler. This PR defines an attribute to be included in a log message body that will identify which profiling process was responsible for collecting and reporting the exported callstacks.

See #336 for previous discussion.

Copy link
Contributor

@Kielek Kielek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to handle 2 things:

  1. What if lack of the attribute: #336 (comment)
  2. Future consideration about OTLP protocol: #336 (comment)

@tduncan
Copy link
Author

tduncan commented May 30, 2025

Quoting from conversation started in #336 (comment)

I am not against change for a short term solution, but we need to start thinking how to handle the OTLP profiling >>>protocol. I hope it will go live this year.
In such cases you will be not able to expect such attributes. For known, libraries, you can probably utilize

InstrumentationScope
For all generic cases:
SampleType
PeriodType
SampleType and PeriodType would seem to be describing the data itself, where as what we need to know is information >>about the context from which the callstacks were collected as the values for both SampleType and PeriodType would be >>identical. The shape of the profiling data does not vary between the existing continuous profiling solution and the trace >>snapshot extension we're porting into the Java agent.

I was hoping to use InstrumentationScope but conversations with individuals more familiar with the GDI process made it >>clear that, at least for now, the InstrumentationScope information will not be available for us to rely on.

For our internal internal profiling format, we can put any data we want. The problem will be with the OTLP-profiling of the >protocol. From the very beginning we should be ready to understand different sets of data. The most important bringing >from OTEL world is ebpf-profiling. What is more, for sure >the .NET data will be send in the OTel-compliant way. The data source is working, we need to add only the exporter.

Avoiding any Cisco/Splunk specific attributes should be our goal. If needed, we can start working on semantic-convention >for profiling, but need to have good proposal for this.

@Kielek, I agree and ideally we'd use InstrumentationScope but my understanding is our GDI pipeline is the limiting factor here, not the agent or backend. Can you think of another viable option for differentiating the source of the profiling stack traces?

@tduncan tduncan requested a review from Kielek May 30, 2025 21:17
@laurit
Copy link
Contributor

laurit commented Jun 12, 2025

Needed to handle 2 things:

  1. What if lack of the attribute: Define profiling.instrumentation.source Log Message Attribute #336 (comment)

seems to be addressed

  1. Future consideration about OTLP protocol: Define profiling.instrumentation.source Log Message Attribute #336 (comment)

I think we should deal with this when we have an otlp profiling ingest and an implementation that is able to send data to it. To me the most natural way seems to be using the sample type https://github.com/open-telemetry/opentelemetry-proto/blob/2bd940b2b77c1ab57c27166af21384906da7bb2b/opentelemetry/proto/profiles/v1development/profiles.proto#L192 If that doesn't work could also set an attribute on profile.

@tduncan
Copy link
Author

tduncan commented Jun 12, 2025

  1. Future consideration about OTLP protocol: Define profiling.instrumentation.source Log Message Attribute #336 (comment)

I think we should deal with this when we have an otlp profiling ingest and an implementation that is able to send data to it. To me the most natural way seems to be using the sample type https://github.com/open-telemetry/opentelemetry-proto/blob/2bd940b2b77c1ab57c27166af21384906da7bb2b/opentelemetry/proto/profiles/v1development/profiles.proto#L192 If that doesn't work could also set an attribute on profile.

Sample type seems to be an awkward fit here. The comments for that field suggest its intended for the type of profiling data (cpu vs memory) with an associated unit. We'd need an additional dimension to the ValueType message since continuous profiles supports both cpu and memory.

How far into the future is this concern? How realistic would it be to have GDI backend forward the instrumentation scope so its accessible later in the pipeline?

@laurit
Copy link
Contributor

laurit commented Jun 18, 2025

Sample type seems to be an awkward fit here. The comments for that field suggest its intended for the type of profiling data (cpu vs memory) with an associated unit. We'd need an additional dimension to the ValueType message since continuous profiles supports both cpu and memory.

The comment says

  // For a cpu profile this might be:
  //   [["cpu","nanoseconds"]] or [["wall","seconds"]] or [["syscall","count"]]

I presume that at some point specification will define a set of well know values that represent wall time cpu profile, cpu time profile etc. Compliant backends can make sense of these samples and so something with them. Custom profilers, like our spnapshot profiler, that aren't specified in otel will use custom values and backends that don't recognize it can drop the samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants