Skip to content

Conversation

@charlesmyu
Copy link
Contributor

@charlesmyu charlesmyu commented Oct 16, 2025

What Does This Do

  • Instruments the fromSparkPlan function to:
    • Parse the plan parameter into a map of String properties
    • Update the meta field of returned SparkPlanInfo with those properties
  • Creates a Spark21XPlanUtils class with a extractPlanProduct method that parses a SparkPlan object and returns the properties as a <String, String> map
  • Creates a AbstractSparkPlanUtils class with a parsePlanProduct method that parses the various Objects extracted by extractPlanProduct to return a comprehensible string representation
    • This is extended by Spark21XPlanUtils
    • Updates the toJson function in SparkSQLUtils to write a JSON object if possible, otherwise just write a string
  • Updates tests to reflect these changes & additions
  • Gates this feature with two feature flags:
    • dd.data.jobs.experimental_features.enabled: meant to gate all experimental features before we GA, we should leave this on by default for all internal users
    • dd.data.jobs.parse_spark_plan.enabled: meant to gate this feature specifically

Motivation

The SparkPlan houses additional details about its execution that is useful to visualize for operators to use. Extract these into spans so they can be ingested.

Additional Notes

This PR leverages the existing meta field in the SparkPlanInfo class. This should be safe as we don't overwrite the field if any data exists, and it is currently only used for ScanExec node details. Furthermore since this class appears to be primarily intended as an abstraction for informational purposes, any faulty updates to the object shouldn't result in any breaking issues.

Also note that we use the Product API to obtain the key names (using productElementName), however this was only made available in Scala 2.13. As a result the Scala 2.12 instrumentation uses arbitrary _dd.unknown_key.X names for the keys, so the values can at least be extracted.

Worth mentioning that this PR does not introduce traversal of the physical plan itself into the tracer - this is left to Spark itself. This is because the recursive fromSparkPlan method is instrumented, meaning as each node is built the tracer is invoked to parse it, and we expressly filter out any potential QueryPlan nodes when performing the parsing.

Contributor Checklist

Jira ticket: DJM-974

@charlesmyu charlesmyu added type: enhancement Enhancements and improvements inst: apache spark Apache Spark instrumentation labels Oct 16, 2025
@datadog-official
Copy link

datadog-official bot commented Oct 16, 2025

🎯 Code Coverage
Patch Coverage: 0.00%
Total Coverage: 59.88% (+0.11%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ef18062 | Docs | Was this helpful? Give us feedback!

@pr-commenter
Copy link

pr-commenter bot commented Oct 16, 2025

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-974/extract-spark-plan-product
git_commit_date 1761587776 1761592445
git_commit_sha b733cda ef18062
release_version 1.55.0-SNAPSHOT~b733cdaf4e 1.55.0-SNAPSHOT~ef18062877
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1761594398 1761594398
ci_job_id 1200644270 1200644270
ci_pipeline_id 80422075 80422075
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-omrh1xla 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-omrh1xla 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 1 performance improvements and 1 performance regressions! Performance is the same for 51 metrics, 12 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:petclinic:profiling:Flare Poller better
[-314.901µs; -87.317µs] or [-7.317%; -2.029%]
4.102ms 4.303ms
scenario:startup:petclinic:tracing:Remote Config worse
[+15.082µs; +42.192µs] or [+2.254%; +6.306%]
697.697µs 669.059µs
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.55.0-SNAPSHOT~ef18062877, baseline=1.55.0-SNAPSHOT~b733cdaf4e

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.032 s) : 0, 1031774
Total [baseline] (10.726 s) : 0, 10726027
Agent [candidate] (1.014 s) : 0, 1014336
Total [candidate] (10.687 s) : 0, 10686893
section appsec
Agent [baseline] (1.202 s) : 0, 1202484
Total [baseline] (10.915 s) : 0, 10914976
Agent [candidate] (1.192 s) : 0, 1192260
Total [candidate] (11.03 s) : 0, 11030405
section iast
Agent [baseline] (1.169 s) : 0, 1168628
Total [baseline] (11.226 s) : 0, 11226264
Agent [candidate] (1.151 s) : 0, 1151428
Total [candidate] (11.011 s) : 0, 11010860
section profiling
Agent [baseline] (1.183 s) : 0, 1183168
Total [baseline] (10.934 s) : 0, 10933993
Agent [candidate] (1.165 s) : 0, 1164675
Total [candidate] (11.078 s) : 0, 11078204
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.032 s -
Agent appsec 1.202 s 170.71 ms (16.5%)
Agent iast 1.169 s 136.854 ms (13.3%)
Agent profiling 1.183 s 151.394 ms (14.7%)
Total tracing 10.726 s -
Total appsec 10.915 s 188.949 ms (1.8%)
Total iast 11.226 s 500.237 ms (4.7%)
Total profiling 10.934 s 207.966 ms (1.9%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.014 s -
Agent appsec 1.192 s 177.925 ms (17.5%)
Agent iast 1.151 s 137.092 ms (13.5%)
Agent profiling 1.165 s 150.339 ms (14.8%)
Total tracing 10.687 s -
Total appsec 11.03 s 343.511 ms (3.2%)
Total iast 11.011 s 323.966 ms (3.0%)
Total profiling 11.078 s 391.31 ms (3.7%)
gantt
    title petclinic - break down per module: candidate=1.55.0-SNAPSHOT~ef18062877, baseline=1.55.0-SNAPSHOT~b733cdaf4e

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.459 ms) : 0, 1459
crashtracking [candidate] (1.473 ms) : 0, 1473
BytebuddyAgent [baseline] (703.438 ms) : 0, 703438
BytebuddyAgent [candidate] (692.696 ms) : 0, 692696
GlobalTracer [baseline] (245.243 ms) : 0, 245243
GlobalTracer [candidate] (241.781 ms) : 0, 241781
AppSec [baseline] (32.528 ms) : 0, 32528
AppSec [candidate] (32.304 ms) : 0, 32304
Debugger [baseline] (6.414 ms) : 0, 6414
Debugger [candidate] (6.391 ms) : 0, 6391
Remote Config [baseline] (669.059 µs) : 0, 669
Remote Config [candidate] (697.697 µs) : 0, 698
Telemetry [baseline] (14.322 ms) : 0, 14322
Telemetry [candidate] (9.279 ms) : 0, 9279
Flare Poller [baseline] (6.491 ms) : 0, 6491
Flare Poller [candidate] (8.628 ms) : 0, 8628
section appsec
crashtracking [baseline] (1.45 ms) : 0, 1450
crashtracking [candidate] (1.462 ms) : 0, 1462
BytebuddyAgent [baseline] (726.0 ms) : 0, 726000
BytebuddyAgent [candidate] (716.126 ms) : 0, 716126
GlobalTracer [baseline] (235.925 ms) : 0, 235925
GlobalTracer [candidate] (234.597 ms) : 0, 234597
AppSec [baseline] (174.082 ms) : 0, 174082
AppSec [candidate] (175.238 ms) : 0, 175238
Debugger [baseline] (5.893 ms) : 0, 5893
Debugger [candidate] (6.108 ms) : 0, 6108
Remote Config [baseline] (630.172 µs) : 0, 630
Remote Config [candidate] (622.401 µs) : 0, 622
Telemetry [baseline] (8.378 ms) : 0, 8378
Telemetry [candidate] (8.459 ms) : 0, 8459
Flare Poller [baseline] (3.938 ms) : 0, 3938
Flare Poller [candidate] (3.9 ms) : 0, 3900
IAST [baseline] (25.068 ms) : 0, 25068
IAST [candidate] (24.671 ms) : 0, 24671
section iast
crashtracking [baseline] (1.494 ms) : 0, 1494
crashtracking [candidate] (1.453 ms) : 0, 1453
BytebuddyAgent [baseline] (828.975 ms) : 0, 828975
BytebuddyAgent [candidate] (815.275 ms) : 0, 815275
GlobalTracer [baseline] (234.958 ms) : 0, 234958
GlobalTracer [candidate] (231.694 ms) : 0, 231694
AppSec [baseline] (30.204 ms) : 0, 30204
AppSec [candidate] (35.266 ms) : 0, 35266
Debugger [baseline] (6.203 ms) : 0, 6203
Debugger [candidate] (6.158 ms) : 0, 6158
Remote Config [baseline] (602.278 µs) : 0, 602
Remote Config [candidate] (603.838 µs) : 0, 604
Telemetry [baseline] (8.548 ms) : 0, 8548
Telemetry [candidate] (8.654 ms) : 0, 8654
Flare Poller [baseline] (4.215 ms) : 0, 4215
Flare Poller [candidate] (4.219 ms) : 0, 4219
IAST [baseline] (31.95 ms) : 0, 31950
IAST [candidate] (26.516 ms) : 0, 26516
section profiling
crashtracking [baseline] (1.469 ms) : 0, 1469
crashtracking [candidate] (1.442 ms) : 0, 1442
BytebuddyAgent [baseline] (732.499 ms) : 0, 732499
BytebuddyAgent [candidate] (724.027 ms) : 0, 724027
GlobalTracer [baseline] (221.845 ms) : 0, 221845
GlobalTracer [candidate] (218.398 ms) : 0, 218398
AppSec [baseline] (33.282 ms) : 0, 33282
AppSec [candidate] (32.347 ms) : 0, 32347
Debugger [baseline] (9.831 ms) : 0, 9831
Debugger [candidate] (7.352 ms) : 0, 7352
Remote Config [baseline] (1.495 ms) : 0, 1495
Remote Config [candidate] (724.946 µs) : 0, 725
Telemetry [baseline] (11.524 ms) : 0, 11524
Telemetry [candidate] (15.363 ms) : 0, 15363
Flare Poller [baseline] (4.303 ms) : 0, 4303
Flare Poller [candidate] (4.102 ms) : 0, 4102
ProfilingAgent [baseline] (111.151 ms) : 0, 111151
ProfilingAgent [candidate] (107.651 ms) : 0, 107651
Profiling [baseline] (111.807 ms) : 0, 111807
Profiling [candidate] (108.629 ms) : 0, 108629
Loading
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.55.0-SNAPSHOT~ef18062877, baseline=1.55.0-SNAPSHOT~b733cdaf4e

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.038 s) : 0, 1038089
Total [baseline] (8.669 s) : 0, 8669239
Agent [candidate] (1.018 s) : 0, 1017927
Total [candidate] (8.659 s) : 0, 8658700
section iast
Agent [baseline] (1.163 s) : 0, 1162866
Total [baseline] (9.366 s) : 0, 9366449
Agent [candidate] (1.149 s) : 0, 1149108
Total [candidate] (9.3 s) : 0, 9299574
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.038 s -
Agent iast 1.163 s 124.777 ms (12.0%)
Total tracing 8.669 s -
Total iast 9.366 s 697.211 ms (8.0%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.018 s -
Agent iast 1.149 s 131.182 ms (12.9%)
Total tracing 8.659 s -
Total iast 9.3 s 640.874 ms (7.4%)
gantt
    title insecure-bank - break down per module: candidate=1.55.0-SNAPSHOT~ef18062877, baseline=1.55.0-SNAPSHOT~b733cdaf4e

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.494 ms) : 0, 1494
crashtracking [candidate] (1.45 ms) : 0, 1450
BytebuddyAgent [baseline] (707.837 ms) : 0, 707837
BytebuddyAgent [candidate] (694.688 ms) : 0, 694688
GlobalTracer [baseline] (246.777 ms) : 0, 246777
GlobalTracer [candidate] (242.552 ms) : 0, 242552
AppSec [baseline] (32.614 ms) : 0, 32614
AppSec [candidate] (32.462 ms) : 0, 32462
Debugger [baseline] (6.519 ms) : 0, 6519
Debugger [candidate] (6.428 ms) : 0, 6428
Remote Config [baseline] (681.449 µs) : 0, 681
Remote Config [candidate] (693.675 µs) : 0, 694
Telemetry [baseline] (12.956 ms) : 0, 12956
Telemetry [candidate] (9.283 ms) : 0, 9283
Flare Poller [baseline] (7.979 ms) : 0, 7979
Flare Poller [candidate] (9.279 ms) : 0, 9279
section iast
crashtracking [baseline] (1.476 ms) : 0, 1476
crashtracking [candidate] (1.473 ms) : 0, 1473
BytebuddyAgent [baseline] (824.723 ms) : 0, 824723
BytebuddyAgent [candidate] (813.417 ms) : 0, 813417
GlobalTracer [baseline] (234.172 ms) : 0, 234172
GlobalTracer [candidate] (230.982 ms) : 0, 230982
AppSec [baseline] (28.921 ms) : 0, 28921
AppSec [candidate] (35.485 ms) : 0, 35485
Debugger [baseline] (6.181 ms) : 0, 6181
Debugger [candidate] (6.139 ms) : 0, 6139
Remote Config [baseline] (599.294 µs) : 0, 599
Remote Config [candidate] (612.825 µs) : 0, 613
Telemetry [baseline] (8.523 ms) : 0, 8523
Telemetry [candidate] (8.705 ms) : 0, 8705
Flare Poller [baseline] (4.129 ms) : 0, 4129
Flare Poller [candidate] (4.328 ms) : 0, 4328
IAST [baseline] (32.841 ms) : 0, 32841
IAST [candidate] (26.558 ms) : 0, 26558
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-974/extract-spark-plan-product
git_commit_date 1761587776 1761592445
git_commit_sha b733cda ef18062
release_version 1.55.0-SNAPSHOT~b733cdaf4e 1.55.0-SNAPSHOT~ef18062877
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1761594067 1761594067
ci_job_id 1200644271 1200644271
ci_pipeline_id 80422075 80422075
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-i689ha5n 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-i689ha5n 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 4 performance improvements and 5 performance regressions! Performance is the same for 3 metrics, 12 unstable metrics.

scenario Δ mean http_req_duration Δ mean throughput candidate mean http_req_duration candidate mean throughput baseline mean http_req_duration baseline mean throughput
scenario:load:insecure-bank:no_agent:high_load better
[-220.946µs; -116.168µs] or [-5.011%; -2.635%]
unstable
[-77.823op/s; +158.823op/s] or [-7.474%; +15.254%]
4.241ms 1081.688op/s 4.409ms 1041.188op/s
scenario:load:insecure-bank:tracing:high_load better
[-512.031µs; -268.810µs] or [-6.403%; -3.361%]
unstable
[-43.961op/s; +102.648op/s] or [-7.586%; +17.714%]
7.607ms 608.812op/s 7.997ms 579.469op/s
scenario:load:insecure-bank:iast_GLOBAL:high_load worse
[+0.814ms; +1.259ms] or [+7.729%; +11.954%]
unstable
[-90.482op/s; +11.920op/s] or [-20.500%; +2.701%]
11.566ms 402.094op/s 10.530ms 441.375op/s
scenario:load:insecure-bank:profiling:high_load better
[-534.912µs; -216.361µs] or [-5.765%; -2.332%]
unstable
[-43.686op/s; +85.311op/s] or [-8.731%; +17.051%]
8.902ms 521.156op/s 9.278ms 500.344op/s
scenario:load:insecure-bank:iast:high_load better
[-597.518µs; -250.286µs] or [-6.060%; -2.538%]
unstable
[-37.835op/s; +79.460op/s] or [-8.030%; +16.864%]
9.436ms 492.000op/s 9.860ms 471.188op/s
scenario:load:petclinic:no_agent:high_load worse
[+1.106ms; +1.760ms] or [+3.055%; +4.859%]
unstable
[-13.746op/s; +3.996op/s] or [-10.648%; +3.095%]
37.654ms 124.213op/s 36.221ms 129.088op/s
scenario:load:petclinic:appsec:high_load worse
[+1.373ms; +2.282ms] or [+2.910%; +4.839%]
unstable
[-10.851op/s; +3.476op/s] or [-10.935%; +3.503%]
48.990ms 95.550op/s 47.162ms 99.237op/s
scenario:load:petclinic:tracing:high_load worse
[+1.417ms; +2.214ms] or [+3.294%; +5.147%]
unstable
[-11.959op/s; +3.234op/s] or [-10.996%; +2.974%]
44.829ms 104.400op/s 43.014ms 108.763op/s
scenario:load:petclinic:iast:high_load worse
[+1.407ms; +2.256ms] or [+3.161%; +5.070%]
unstable
[-11.338op/s; +3.063op/s] or [-10.782%; +2.913%]
46.330ms 101.025op/s 44.499ms 105.162op/s
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~ef18062877, baseline=1.55.0-SNAPSHOT~b733cdaf4e
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.409 ms) : 4358, 4460
.   : milestone, 4409,
iast (9.86 ms) : 9695, 10024
.   : milestone, 9860,
iast_FULL (14.124 ms) : 13841, 14407
.   : milestone, 14124,
iast_GLOBAL (10.53 ms) : 10328, 10732
.   : milestone, 10530,
profiling (9.278 ms) : 9130, 9426
.   : milestone, 9278,
tracing (7.997 ms) : 7879, 8116
.   : milestone, 7997,
section candidate
no_agent (4.241 ms) : 4194, 4287
.   : milestone, 4241,
iast (9.436 ms) : 9278, 9593
.   : milestone, 9436,
iast_FULL (14.198 ms) : 13911, 14485
.   : milestone, 14198,
iast_GLOBAL (11.566 ms) : 11355, 11778
.   : milestone, 11566,
profiling (8.902 ms) : 8754, 9050
.   : milestone, 8902,
tracing (7.607 ms) : 7500, 7714
.   : milestone, 7607,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.409 ms [4.358 ms, 4.46 ms] -
iast 9.86 ms [9.695 ms, 10.024 ms] 5.45 ms (123.6%)
iast_FULL 14.124 ms [13.841 ms, 14.407 ms] 9.715 ms (220.3%)
iast_GLOBAL 10.53 ms [10.328 ms, 10.732 ms] 6.121 ms (138.8%)
profiling 9.278 ms [9.13 ms, 9.426 ms] 4.869 ms (110.4%)
tracing 7.997 ms [7.879 ms, 8.116 ms] 3.588 ms (81.4%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.241 ms [4.194 ms, 4.287 ms] -
iast 9.436 ms [9.278 ms, 9.593 ms] 5.195 ms (122.5%)
iast_FULL 14.198 ms [13.911 ms, 14.485 ms] 9.957 ms (234.8%)
iast_GLOBAL 11.566 ms [11.355 ms, 11.778 ms] 7.326 ms (172.7%)
profiling 8.902 ms [8.754 ms, 9.05 ms] 4.662 ms (109.9%)
tracing 7.607 ms [7.5 ms, 7.714 ms] 3.366 ms (79.4%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~ef18062877, baseline=1.55.0-SNAPSHOT~b733cdaf4e
    dateFormat X
    axisFormat %s
section baseline
no_agent (36.221 ms) : 35925, 36517
.   : milestone, 36221,
appsec (47.162 ms) : 46746, 47578
.   : milestone, 47162,
code_origins (44.42 ms) : 44028, 44811
.   : milestone, 44420,
iast (44.499 ms) : 44117, 44880
.   : milestone, 44499,
profiling (49.007 ms) : 48522, 49492
.   : milestone, 49007,
tracing (43.014 ms) : 42646, 43382
.   : milestone, 43014,
section candidate
no_agent (37.654 ms) : 37343, 37965
.   : milestone, 37654,
appsec (48.99 ms) : 48560, 49419
.   : milestone, 48990,
code_origins (44.434 ms) : 44063, 44806
.   : milestone, 44434,
iast (46.33 ms) : 45923, 46738
.   : milestone, 46330,
profiling (48.577 ms) : 48098, 49056
.   : milestone, 48577,
tracing (44.829 ms) : 44457, 45202
.   : milestone, 44829,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 36.221 ms [35.925 ms, 36.517 ms] -
appsec 47.162 ms [46.746 ms, 47.578 ms] 10.941 ms (30.2%)
code_origins 44.42 ms [44.028 ms, 44.811 ms] 8.199 ms (22.6%)
iast 44.499 ms [44.117 ms, 44.88 ms] 8.278 ms (22.9%)
profiling 49.007 ms [48.522 ms, 49.492 ms] 12.786 ms (35.3%)
tracing 43.014 ms [42.646 ms, 43.382 ms] 6.793 ms (18.8%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 37.654 ms [37.343 ms, 37.965 ms] -
appsec 48.99 ms [48.56 ms, 49.419 ms] 11.336 ms (30.1%)
code_origins 44.434 ms [44.063 ms, 44.806 ms] 6.78 ms (18.0%)
iast 46.33 ms [45.923 ms, 46.738 ms] 8.676 ms (23.0%)
profiling 48.577 ms [48.098 ms, 49.056 ms] 10.923 ms (29.0%)
tracing 44.829 ms [44.457 ms, 45.202 ms] 7.175 ms (19.1%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-974/extract-spark-plan-product
git_commit_date 1761587776 1761592445
git_commit_sha b733cda ef18062
release_version 1.55.0-SNAPSHOT~b733cdaf4e 1.55.0-SNAPSHOT~ef18062877
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1761594583 1761594583
ci_job_id 1200644272 1200644272
ci_pipeline_id 80422075 80422075
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-spd6830d 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-spd6830d 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~ef18062877, baseline=1.55.0-SNAPSHOT~b733cdaf4e
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.472 ms) : 1461, 1484
.   : milestone, 1472,
appsec (3.708 ms) : 3487, 3930
.   : milestone, 3708,
iast (2.191 ms) : 2128, 2255
.   : milestone, 2191,
iast_GLOBAL (2.239 ms) : 2175, 2303
.   : milestone, 2239,
profiling (2.078 ms) : 2024, 2131
.   : milestone, 2078,
tracing (2.024 ms) : 1974, 2073
.   : milestone, 2024,
section candidate
no_agent (1.473 ms) : 1462, 1485
.   : milestone, 1473,
appsec (3.699 ms) : 3481, 3918
.   : milestone, 3699,
iast (2.207 ms) : 2144, 2271
.   : milestone, 2207,
iast_GLOBAL (2.265 ms) : 2200, 2330
.   : milestone, 2265,
profiling (2.054 ms) : 2003, 2106
.   : milestone, 2054,
tracing (2.022 ms) : 1973, 2072
.   : milestone, 2022,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.472 ms [1.461 ms, 1.484 ms] -
appsec 3.708 ms [3.487 ms, 3.93 ms] 2.236 ms (151.9%)
iast 2.191 ms [2.128 ms, 2.255 ms] 719.251 µs (48.9%)
iast_GLOBAL 2.239 ms [2.175 ms, 2.303 ms] 767.135 µs (52.1%)
profiling 2.078 ms [2.024 ms, 2.131 ms] 605.705 µs (41.1%)
tracing 2.024 ms [1.974 ms, 2.073 ms] 551.589 µs (37.5%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.473 ms [1.462 ms, 1.485 ms] -
appsec 3.699 ms [3.481 ms, 3.918 ms] 2.226 ms (151.1%)
iast 2.207 ms [2.144 ms, 2.271 ms] 733.944 µs (49.8%)
iast_GLOBAL 2.265 ms [2.2 ms, 2.33 ms] 791.58 µs (53.7%)
profiling 2.054 ms [2.003 ms, 2.106 ms] 580.937 µs (39.4%)
tracing 2.022 ms [1.973 ms, 2.072 ms] 549.295 µs (37.3%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~ef18062877, baseline=1.55.0-SNAPSHOT~b733cdaf4e
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.228 s) : 15228000, 15228000
.   : milestone, 15228000,
appsec (15.075 s) : 15075000, 15075000
.   : milestone, 15075000,
iast (18.555 s) : 18555000, 18555000
.   : milestone, 18555000,
iast_GLOBAL (18.179 s) : 18179000, 18179000
.   : milestone, 18179000,
profiling (15.352 s) : 15352000, 15352000
.   : milestone, 15352000,
tracing (15.107 s) : 15107000, 15107000
.   : milestone, 15107000,
section candidate
no_agent (15.01 s) : 15010000, 15010000
.   : milestone, 15010000,
appsec (15.112 s) : 15112000, 15112000
.   : milestone, 15112000,
iast (18.913 s) : 18913000, 18913000
.   : milestone, 18913000,
iast_GLOBAL (17.586 s) : 17586000, 17586000
.   : milestone, 17586000,
profiling (15.379 s) : 15379000, 15379000
.   : milestone, 15379000,
tracing (15.293 s) : 15293000, 15293000
.   : milestone, 15293000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.228 s [15.228 s, 15.228 s] -
appsec 15.075 s [15.075 s, 15.075 s] -153.0 ms (-1.0%)
iast 18.555 s [18.555 s, 18.555 s] 3.327 s (21.8%)
iast_GLOBAL 18.179 s [18.179 s, 18.179 s] 2.951 s (19.4%)
profiling 15.352 s [15.352 s, 15.352 s] 124.0 ms (0.8%)
tracing 15.107 s [15.107 s, 15.107 s] -121.0 ms (-0.8%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.01 s [15.01 s, 15.01 s] -
appsec 15.112 s [15.112 s, 15.112 s] 102.0 ms (0.7%)
iast 18.913 s [18.913 s, 18.913 s] 3.903 s (26.0%)
iast_GLOBAL 17.586 s [17.586 s, 17.586 s] 2.576 s (17.2%)
profiling 15.379 s [15.379 s, 15.379 s] 369.0 ms (2.5%)
tracing 15.293 s [15.293 s, 15.293 s] 283.0 ms (1.9%)

@charlesmyu charlesmyu force-pushed the charles.yu/djm-974/extract-spark-plan-product branch 3 times, most recently from dc41615 to d9d6213 Compare October 16, 2025 22:43
Comment on lines 28 to 29
// Should really only return valid JSON types (Array, Map, String, Boolean, Number, null)
public Object parsePlanProduct(Object value) {
Copy link
Contributor Author

@charlesmyu charlesmyu Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that this method returns an Object instead of something definite like a JSON node (or even just a String). The end goal is to allow any JSON object (other than null, which we filter out) to be serialized into a string using writeObjectToString, and this seemed like the most straightforwards way to achieve that. There's probably some more idiomatic way I'm missing - happy to hear about it if anyone has ideas!

@charlesmyu charlesmyu force-pushed the charles.yu/djm-974/extract-spark-plan-product branch from d9d6213 to 54ab1ad Compare October 16, 2025 22:47
@charlesmyu charlesmyu force-pushed the charles.yu/djm-974/extract-spark-plan-product branch from 54ab1ad to 0279fff Compare October 17, 2025 04:53
public static void exit(
@Advice.Return(readOnly = false) SparkPlanInfo planInfo,
@Advice.Argument(0) SparkPlan plan) {
if (planInfo.metadata().size() == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using the existing metadata on the DataSourceScanExec nodes, we open ourselves to a bit of inconsistency in the JSON parsing:

"meta": {
    "Format": "Parquet",
    "Batched": true,
    ...,
    "DataFilters": "[CASE WHEN PULocationID#28 IN (236,132,161) THEN true ELSE isnotnull(PULocationID#28) END]"
},

Specifically the lists are not quoted & escaped, which means when we read out the field it's treated as a string rather than a JSON native array. Ideally we would parse this ourselves and upsert it so we can control that formatting, but obviously there's a risk of the parsing going wrong and impacting something that actually uses the field. Leaning slightly towards keeping the formatting as-is in favour of not touching existing fields but happy to hear any other thoughts on this...

// An extension of how Spark translates `SparkPlan`s to `SparkPlanInfo`, see here:
// https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala#L54
public class Spark213PlanUtils extends AbstractSparkPlanUtils {
public Map<String, String> extractPlanProduct(TreeNode plan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the OpenLineage connector we had a special facet for storing serialized LogicalPlan of the query. This was the most problematic feature we ever had. Because the plan can contain everything. For example, if a user creates in memory few gigabyte dataframe, then this becomes a node in a logical plan. And OpenLineage connector tried to serliaze it and failed the whole Spark driver.

This PR seems to be doing same thing for the physical plan. I think we shouldn't serialize the object when we don't know what's inside.

Copy link
Contributor Author

@charlesmyu charlesmyu Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted about this over a call, summarizing for posterity:

  • Worth clarifying that this function does not traverse the tree itself; we leave that up to Spark because we instrument the recursive fromSparkPlan method
  • We should avoid serializing anything we don't know about arbitrarily, especially using toString(). Since we are taking the full product of the TreeNode we could get some enormous structure (e.g. improbable, but maybe an array of all the data) and toString() would then attempt to serialize all of that data
    • Instead we should lean solely on simpleString() which is safe by default and default to not serializing otherwise. We could then only serialize other TreeNodes and leave out any unknown or unexpected data structures
    • With this change it would even be safe to parse the child QueryPlan nodes because it would no longer output the long physical plan, and instead print the one line string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 177 to 185
// Parse any nested objects to a standard form that ignores the keys
// Right now we do this by just asserting the key set and none of the values
static Object parseNestedMetaObject(Object value) {
if (value instanceof Map) {
return value.keySet()
} else {
return value
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was driving me nuts - there must be a better way to accomplish this without a ton of additional code... The issue is that for the Spark32 suite of tests, the expectations for the meta fields use named keys, but when we run the tests using Scala 2.12 we expect those to all show up as _dd.unknown_key.*. I added a (not great) way around that in assertSQLPlanEquals, which worked fine until we started getting nested maps that can have unknown keys. e.g.:

"meta": {
  "_dd.unparsed" : "any",
  "outputPartitioning" : {
    "HashPartitioning" : {
      "numPartitions" : 2,
      "expressions" : [ "string_col#28" ]
    }
  },
  "shuffleOrigin" : "ENSURE_REQUIREMENTS"
},

Where the numPartitions and expressions keys would show up as _dd.unknown_key.* in Scala 2.12. Initially I went for a recursive approach but that ended up feeling very bloated, so I abandoned it in favour of a subpar keyset check (i.e. only check that HashPartitioning exists in the map).

No false impressions that this is any good - let me know if there's a better way I'm missing, if just the key check is okay (only applies to the test suite running Scala 2.12/Spark 3.2.0, the other two suites compare everything as expected), or if we just have to put up with the recursive approach...

Copy link
Contributor Author

@charlesmyu charlesmyu Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the approach to compare lists of values instead of whatever I had put before - a bit cleaner and simpler to follow. Has its own downsides (e.g. not perfect comparisons as some stable keys are eliminated, and the containsAll comparison can be fooled) but at least it attempts to compare values and is much easier to maintain. Given it's on an older version of Scala that will no longer be supported for new Spark versions, I think this should probably be fine.

c68b356 (#9783)

Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the first round of reading and left some comments.
Pls let me know do you think about it.

@charlesmyu charlesmyu marked this pull request as ready for review October 21, 2025 21:20
@charlesmyu charlesmyu requested a review from a team as a code owner October 21, 2025 21:20
@charlesmyu charlesmyu requested a review from a team as a code owner October 21, 2025 21:20
@charlesmyu charlesmyu requested a review from mhlidd October 21, 2025 21:20
Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. My primary concern is naturally to make sure this won't cause problems on any spark version nor any physical plans the job is processing.

I think PR does well to achieve this:

  • feature is going to be rolled out first to users which explicitly turn it on,
  • it serializes only known nodes (serializing unknown nodes is a common pitfall)
  • serializer is limited on recursion depth and max collection sizes
  • code introduced depends in a minimal way on Spark classes and methods, making it resilient to future updates on Spark side.

Few minor comments added. Happy to approve the PR once they're resolved.

assert res.toString() == "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]"
}

def "unknown objects should return null"() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating a test for this. I think this is really important.

// in Spark v3+, the signature of `simpleString` includes an int parameter for `maxFields`
return TreeNode.class
.getDeclaredMethod("simpleString", new Class[] {int.class})
.invoke(value, MAX_LENGTH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls make sure this doesn't throw NullPointerException in case getDeclaredMethod returns null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's true! Based on the signature of getDeclaredMethod it looks like we should expect NoSuchMethodException in that case:

public Method getDeclaredMethod(String name, Class<?>... parameterTypes) throws NoSuchMethodException, SecurityException

I've added NullPointerException to the catch just in case, though. 5527ad0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just kidding, the spotbugs job did not like that - reverted that change. I'm fairly confident based on the signature & impl that we should only get NoSuchMethodException, though, and not NullPointerException. Let me know if we'd still like to do a more explicit null check.

18e51d5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If invoke returns null, our code will call toString on null causing NullPointerException.
Let me know if this is possible or never going to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, understood - you're right, I was looking at the wrong call. Updated to be an explicit cast. de336b9 (#9783)

Copy link
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left first comments. I will continue the review later and let @mhlidd do the full review 😉

Comment on lines 110 to 111
args.$plus$plus(
JavaConverters.mapAsScalaMap(planUtils.extractFormattedProduct(plan))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ question: ‏Quick question: is this for creating a copy of the map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be frank, I was struggling with this a lot - I was trying to convert scala.collection.mutable.Map to scala.collection.immutable.Map, but I didn't quite know how to do those with a lot of the Scala implicits. Updated now to use toMap instead (figured out how to get the <:< implicit sorted properly). Let me know if this looks better!

160558e

public class Spark212PlanSerializer extends AbstractSparkPlanSerializer {
@Override
public String getKey(int idx, TreeNode node) {
return String.format("_dd.unknown_key.%d", idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ question: ‏Would String format be less expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you meant concatenation here 😄 updated!

d4c8264

private final String SPARK_PKG_NAME = "org.apache.spark";

private final Set<String> SAFE_PARSE_TRAVERSE =
new HashSet<>(Arrays.asList(SPARK_PKG_NAME + ".sql.catalyst.plans.physical.Partitioning"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 suggestion: ‏Could we use Collections.singleton() rather than Arrays.asList() to avoid array allocation if we only need a Collection<String>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Out of curiosity, are there better ways to declare the other multi-object sets as well, or is a HashSet as good as it gets?

d4c8264

planInfo.simpleString(),
planInfo.children(),
HashMap.from(
JavaConverters.asScala(planUtils.extractFormattedProduct(plan)).toList()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JavaConverters.asScala(planUtils.extractFormattedProduct(plan)).toList()),
JavaConverters.asScala(planUtils.extractFormattedProduct(plan))),

Do we need to convert to a List first before converting to a HashMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated

d4c8264

Comment on lines 110 to 111
args.$plus$plus(
JavaConverters.mapAsScalaMap(planUtils.extractFormattedProduct(plan))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need args here? It seems like it would always be an empty map, so there isn't a need to concatenate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment from above:

To be frank, I was struggling with this a lot - I was trying to convert scala.collection.mutable.Map to scala.collection.immutable.Map, but I didn't quite know how to do those with a lot of the Scala implicits. Updated now to use toMap instead (figured out how to get the <:< implicit sorted properly). Let me know if this looks better!

160558e

protected static assertStringSQLPlanSubset(String expectedString, String actualString) {
System.err.println("Checking if expected $expectedString SQL plan is a super set of $actualString")

protected static assertStringSQLPlanSubset(String expectedString, String actualString, String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create a Util class that stores all Spark assertions? This way this test classes can be separated from the assertion definitions that are used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what you mean, but would you have an example of a util class that's similar to what you're looking for? My assumption is this would be useful if we decide to swap out the assertion framework used?

Comment on lines +83 to +86
public static final String DATA_JOBS_PARSE_SPARK_PLAN_ENABLED =
"data.jobs.parse_spark_plan.enabled";
public static final String DATA_JOBS_EXPERIMENTAL_FEATURES_ENABLED =
"data.jobs.experimental_features.enabled";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be added to metadata/supported-configurations.json and documented in the Feature Parity Dashboard? I added some docs about this recently that can be referenced.

Copy link
Contributor Author

@charlesmyu charlesmyu Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added (link), thanks for mentioning. I forgot about the env vars - would it be correct to assume that the env var name (e.g. DD_DATA_JOBS_PARSE_SPARK_PLAN_ENABLED) is inferred by the tracer when mapping it to the actual config in Config.java? Just curious since we don't explicitly define the env var keys anywhere else

d4c8264

@charlesmyu charlesmyu requested a review from a team as a code owner October 27, 2025 19:09
@charlesmyu charlesmyu requested review from dougqh and removed request for a team October 27, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: apache spark Apache Spark instrumentation type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants