Skip to content

Conversation

TimDiekmann
Copy link
Member

@TimDiekmann TimDiekmann commented Sep 10, 2025

🌟 What is the purpose of this PR?

This implements CPU time profiling (using pprof) and Wall time profiling (using tracing spans). It can be hooked into the profiling by enabling them in the telemetry config.

🔍 What does this change?

  • Implement Wall time profiling by introducing a layer. This creates a new layer instead of utilizing tracing-flame for three reasons:
    • We want to profile the idle time as well. This is important to also see the actual timings for asynchronous calls such as DB calls, which is one of the main reasons why we want this in the first place. Also, this aligns with the behavior of tracing-opentelemetry, which means that the resulting flame graph will be similar to the traces
    • tracing-flame only supports a io::Write interface but we want to collect them asynchronous in a dedicated thread using channels. While we could use write and flush to send messages, utilizing channels is easier, in particular because it allows us to use dedicated errors
    • tracing-flame creates a folded row for each enter/exit of a span resulting in a huge amount of data. It is enough to only generate a single row for flame graphs (we would need that for flame charts, but for that functionality, we have proper tracing)
  • Implement CPU time profiling by using Pyroscopes' pprof implementation

Both implementation can be separately disabled.

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

  • are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

⚠️ Known issues

This is currently disabled in production because:

  • Wall time profiling results in timeouts in the app (probably related to the std::thread approach over tokio::task)
  • CPU profiling does not start. Also, proper CPU profiling in the graph would need tags per endpoint which we don't have, yet

@TimDiekmann TimDiekmann self-assigned this Sep 10, 2025
@github-actions github-actions bot added area/deps Relates to third-party dependencies (area) area/infra Relates to version control, CI, CD or IaC (area) area/libs Relates to first-party libraries/crates/packages (area) type/eng > backend Owned by the @backend team area/tests New or updated tests area/infra > terraform labels Sep 10, 2025
Copy link

codecov bot commented Sep 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.70%. Comparing base (df279d6) to head (00ff4c7).
⚠️ Report is 12 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7789      +/-   ##
==========================================
- Coverage   54.71%   54.70%   -0.01%     
==========================================
  Files        1085     1085              
  Lines       96195    96207      +12     
  Branches     4547     4553       +6     
==========================================
  Hits        52632    52632              
- Misses      42976    42988      +12     
  Partials      587      587              
Flag Coverage Δ
apps.hash-ai-worker-ts 1.32% <ø> (ø)
apps.hash-api 0.00% <ø> (ø)
local.harpc-client 50.93% <ø> (ø)
local.hash-backend-utils 3.69% <ø> (ø)
local.hash-graph-sdk 0.00% <ø> (ø)
local.hash-isomorphic-utils 0.00% <ø> (ø)
rust.antsi 0.00% <ø> (ø)
rust.error-stack 88.77% <ø> (ø)
rust.harpc-codec 84.22% <ø> (ø)
rust.harpc-net 96.10% <ø> (ø)
rust.harpc-tower 66.80% <ø> (ø)
rust.harpc-types 0.00% <ø> (ø)
rust.harpc-wire-protocol 92.23% <ø> (ø)
rust.hash-codec 72.52% <ø> (ø)
rust.hash-graph-api 3.17% <ø> (ø)
rust.hash-graph-postgres-store 20.06% <ø> (ø)
rust.hash-graph-store 32.93% <ø> (ø)
rust.hash-graph-temporal-versioning 48.22% <ø> (ø)
rust.hash-graph-validation 83.29% <ø> (ø)
rust.hashql-ast 86.45% <ø> (ø)
rust.hashql-core 82.26% <ø> (ø)
rust.hashql-diagnostics 50.24% <ø> (ø)
rust.hashql-eval 71.85% <ø> (ø)
rust.hashql-hir 86.25% <ø> (ø)
rust.hashql-syntax-jexpr 94.20% <ø> (ø)
rust.sarif 97.93% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@TimDiekmann TimDiekmann marked this pull request as ready for review September 10, 2025 12:15
@TimDiekmann TimDiekmann marked this pull request as draft September 10, 2025 14:24
@TimDiekmann
Copy link
Member Author

Benchmarks don't finish, there is something going on which is not reproducible locally. Converting to draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deps Relates to third-party dependencies (area) area/infra > terraform area/infra Relates to version control, CI, CD or IaC (area) area/libs Relates to first-party libraries/crates/packages (area) area/tests New or updated tests type/eng > backend Owned by the @backend team
Development

Successfully merging this pull request may close these issues.

1 participant