Skip to content

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Oct 3, 2025

Changes Made

Related Issues

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly

Base automatically changed from slade/dashboard-subscriber to main October 3, 2025 18:16
@github-actions github-actions bot added the perf label Oct 3, 2025
@colin-ho colin-ho requested a review from srilman October 3, 2025 18:51
@colin-ho colin-ho marked this pull request as ready for review October 3, 2025 18:55
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR implements performance optimizations to reduce unnecessary copying and memory allocations in the runtime statistics system. The changes focus on two main areas: making `QueryID` parameters pass-by-reference throughout the subscriber system, and optimizing statistics data structures.

The core optimization involves changing the Subscriber trait interface to accept &QueryID references instead of owned QueryID values, eliminating atomic reference counting overhead from Arc<str> clones on every method call. This change cascades through all subscriber implementations (Debug, Dashboard, Python) and the context notification system.

The second major change replaces StatSnapshotView with StatSnapshotSend/StatSnapshotRecv types that use SmallVec for inline storage of the typical 3 statistics (CPU time, rows in, rows emitted), avoiding heap allocations for common cases. The Stat enum also gains the Copy trait since it contains only primitive types, further reducing cloning overhead.

The dashboard engine is simplified by removing intermediate serialization structures and directly using the optimized statistics types. The Python FFI layer also benefits from changing string parameters from owned String to borrowed &str references.

These optimizations target the telemetry and monitoring system which is called frequently during query execution, making the changes particularly beneficial for workloads with active dashboard monitoring or multiple subscribers.

Changed Files
Filename Score Overview
src/daft-local-execution/src/runtime_stats/subscribers/query.rs 5/5 Optimized query stats subscriber to pass query_id by reference and eliminate intermediate vector creation
src/daft-context/src/subscribers/debug.rs 5/5 Updated DebugSubscriber to use QueryID references and StatSnapshotSend type
src/daft-dashboard/src/state.rs 4/5 Replaced HashMap with StatSnapshotRecv for more efficient operator stats storage
src/daft-dashboard/src/engine.rs 4/5 Refactored dashboard engine to use StatSnapshotRecv and eliminate intermediate serialization structures
src/common/metrics/src/lib.rs 4/5 Added Copy trait to Stat enum and removed StatSnapshotView struct
src/daft-context/src/subscribers/mod.rs 4/5 Changed Subscriber trait to use QueryID references and StatSnapshotSend type
src/daft-context/src/subscribers/dashboard.rs 5/5 Optimized dashboard subscriber to use QueryID references and simplified stats serialization
src/daft-context/src/python.rs 5/5 Changed Python FFI methods to use string references instead of owned strings
src/daft-context/src/lib.rs 5/5 Updated context notification methods to pass QueryID by reference to subscribers
src/daft-context/src/subscribers/python.rs 5/5 Updated Python subscriber to use QueryID references and iter() instead of into_iter() for stats

Confidence score: 4/5

  • This PR is safe to merge with low risk as it focuses on performance optimizations without changing core functionality
  • Score reflects well-structured optimizations that maintain API compatibility while reducing memory overhead
  • Pay close attention to src/common/metrics/src/lib.rs for the trait changes and type removals that affect multiple components

Sequence Diagram

sequenceDiagram
    participant User
    participant DaftContext
    participant Subscriber
    participant Dashboard
    participant QueryExecution
    participant RuntimeStats
    
    User->>DaftContext: "Start Query"
    DaftContext->>Subscriber: "notify_query_start(query_id, plan)"
    Subscriber->>Dashboard: "POST /query/{id}/start"
    Dashboard-->>Subscriber: "200 OK"
    
    DaftContext->>Subscriber: "notify_optimization_start(query_id)"
    Subscriber->>Dashboard: "POST /query/{id}/plan_start"
    Dashboard-->>Subscriber: "200 OK"
    
    DaftContext->>Subscriber: "notify_optimization_end(query_id, optimized_plan)"
    Subscriber->>Dashboard: "POST /query/{id}/plan_end"
    Dashboard-->>Subscriber: "200 OK"
    
    DaftContext->>Subscriber: "on_exec_start(query_id, node_infos)"
    Subscriber->>Dashboard: "POST /query/{id}/exec/start"
    Dashboard-->>Subscriber: "200 OK"
    
    QueryExecution->>RuntimeStats: "initialize_node(node_id)"
    RuntimeStats->>Subscriber: "on_exec_operator_start(query_id, node_id)"
    Subscriber->>Dashboard: "POST /query/{id}/exec/{op_id}/start"
    Dashboard-->>Subscriber: "200 OK"
    
    QueryExecution->>RuntimeStats: "emit stats"
    RuntimeStats->>Subscriber: "on_exec_emit_stats(query_id, stats)"
    Subscriber->>Dashboard: "POST /query/{id}/exec/emit_stats"
    Dashboard-->>Subscriber: "200 OK"
    
    QueryExecution->>RuntimeStats: "finalize_node(node_id)"
    RuntimeStats->>Subscriber: "on_exec_operator_end(query_id, node_id)"
    Subscriber->>Dashboard: "POST /query/{id}/exec/{op_id}/end"
    Dashboard-->>Subscriber: "200 OK"
    
    DaftContext->>Subscriber: "on_exec_end(query_id)"
    Subscriber->>Dashboard: "POST /query/{id}/exec/end"
    Dashboard-->>Subscriber: "200 OK"
    
    DaftContext->>Subscriber: "notify_result_out(query_id, result)"
    Note over Subscriber: "Collects preview rows"
    
    DaftContext->>Subscriber: "notify_query_end(query_id)"
    Subscriber->>Dashboard: "POST /query/{id}/end"
    Dashboard-->>Subscriber: "200 OK"
    Dashboard-->>User: "Query Complete"
Loading

Additional Comments (2)

  1. src/daft-context/src/python.rs, line 80 (link)

    style: Consider changing this parameter to &str for consistency with other notification methods

  2. src/daft-context/src/python.rs, line 95 (link)

    style: Consider changing this parameter to &str for consistency with other notification methods

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link

codecov bot commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 44.89796% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.22%. Comparing base (0bb0713) to head (070da71).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-context/src/subscribers/dashboard.rs 0.00% 12 Missing ⚠️
src/daft-context/src/subscribers/debug.rs 0.00% 12 Missing ⚠️
src/daft-dashboard/src/engine.rs 0.00% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5329      +/-   ##
==========================================
- Coverage   75.37%   75.22%   -0.16%     
==========================================
  Files         983      986       +3     
  Lines      123738   123902     +164     
==========================================
- Hits        93270    93202      -68     
- Misses      30468    30700     +232     
Files with missing lines Coverage Δ
src/common/metrics/src/lib.rs 39.28% <ø> (+1.44%) ⬆️
src/daft-context/src/lib.rs 95.23% <100.00%> (+0.32%) ⬆️
src/daft-context/src/python.rs 83.51% <100.00%> (+1.16%) ⬆️
src/daft-context/src/subscribers/mod.rs 44.44% <ø> (-10.11%) ⬇️
src/daft-context/src/subscribers/python.rs 92.66% <100.00%> (-3.22%) ⬇️
src/daft-dashboard/src/state.rs 57.69% <ø> (ø)
...l-execution/src/runtime_stats/subscribers/query.rs 86.66% <100.00%> (-0.84%) ⬇️
src/daft-dashboard/src/engine.rs 7.79% <0.00%> (ø)
src/daft-context/src/subscribers/dashboard.rs 2.50% <0.00%> (ø)
src/daft-context/src/subscribers/debug.rs 0.00% <0.00%> (ø)

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants