Skip to content

Conversation

@rescrv
Copy link
Contributor

@rescrv rescrv commented Oct 16, 2025

Description of changes

Add comprehensive gRPC API for heap management operations:

  • Push: manually add schedules to the heap
  • Peek: query heap items with filtering by UUID and time
  • Prune: remove completed tasks with configurable limits
  • PruneBucket: prune a specific time bucket
  • ListBuckets: enumerate time buckets in the heap
  • Summary: retrieve heap statistics

Extend heapservice.proto with message types for all operations,
including Triggerable, Schedule, HeapItem, Limits, PruneStats,
and FilterCriteria definitions.

Implement conversion functions between proto messages and s3heap
types with proper error handling for UUID parsing and timestamp
conversion. Add comprehensive unit tests for all conversions.

Update HeapTender to include HeapReader and HeapPruner alongside
existing HeapWriter to support read and pruning operations.

Add time_cut_off field to Limits struct to enable time-based
filtering. Implement list_buckets() method on HeapReader with
configurable limit (default 1000).

Fix Limits to use Clone instead of Copy since it now contains
non-Copy DateTime field.

🤖 Generated with Claude Code

Test plan

Tests added => CI

Migration plan

N/A

Observability plan

N/A

Documentation Changes

N/A

Co-authored-by: Claude [email protected]

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@rescrv rescrv requested a review from tanujnay112 October 16, 2025 20:35
Comment on lines 59 to 95
/// Convert proto Triggerable to s3heap Triggerable.
pub fn triggerable_from_proto(
proto: chroma_proto::Triggerable,
) -> Result<Triggerable, ConversionError> {
let partitioning_uuid = Uuid::parse_str(&proto.partitioning_uuid)
.map_err(|e| ConversionError(format!("invalid partitioning_uuid: {}", e)))?;
let scheduling_uuid = Uuid::parse_str(&proto.scheduling_uuid)
.map_err(|e| ConversionError(format!("invalid scheduling_uuid: {}", e)))?;
Ok(Triggerable {
partitioning: partitioning_uuid.into(),
scheduling: scheduling_uuid.into(),
})
}

/// Convert s3heap Triggerable to proto Triggerable.
pub fn triggerable_to_proto(triggerable: Triggerable) -> chroma_proto::Triggerable {
chroma_proto::Triggerable {
partitioning_uuid: triggerable.partitioning.to_string(),
scheduling_uuid: triggerable.scheduling.to_string(),
}
}

/// Convert proto Schedule to s3heap Schedule.
pub fn schedule_from_proto(proto: chroma_proto::Schedule) -> Result<Schedule, ConversionError> {
let triggerable = proto
.triggerable
.ok_or_else(|| ConversionError("missing triggerable".to_string()))
.and_then(triggerable_from_proto)?;
let next_scheduled = proto
.next_scheduled
.ok_or_else(|| ConversionError("missing next_scheduled".to_string()))?;
let next_scheduled = DateTime::from_timestamp(
next_scheduled.seconds,
next_scheduled.nanos.try_into().unwrap_or(0),
)
.ok_or_else(|| ConversionError("invalid next_scheduled timestamp".to_string()))?;
let nonce = Uuid::parse_str(&proto.nonce)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Same timestamp parsing issue appears multiple times throughout the code. The pattern timestamp.nanos.try_into().unwrap_or(0) silently converts invalid nanosecond values to 0, which could cause:

  1. Scheduling tasks at wrong times
  2. Pruning wrong time buckets
  3. Inconsistent time comparisons

This affects lines 59, 95, 110, 608, 168, 176 in the conversions and service methods.

Context for Agents
[**CriticalError**]

Same timestamp parsing issue appears multiple times throughout the code. The pattern `timestamp.nanos.try_into().unwrap_or(0)` silently converts invalid nanosecond values to 0, which could cause:
1. Scheduling tasks at wrong times
2. Pruning wrong time buckets
3. Inconsistent time comparisons

This affects lines 59, 95, 110, 608, 168, 176 in the conversions and service methods.

File: rust/s3heap-service/src/lib.rs
Line: 95

Add comprehensive gRPC API for heap management operations:
- Push: manually add schedules to the heap
- Peek: query heap items with filtering by UUID and time
- Prune: remove completed tasks with configurable limits
- PruneBucket: prune a specific time bucket
- ListBuckets: enumerate time buckets in the heap
- Summary: retrieve heap statistics

Extend heapservice.proto with message types for all operations,
including Triggerable, Schedule, HeapItem, Limits, PruneStats,
and FilterCriteria definitions.

Implement conversion functions between proto messages and s3heap
types with proper error handling for UUID parsing and timestamp
conversion. Add comprehensive unit tests for all conversions.

Update HeapTender to include HeapReader and HeapPruner alongside
existing HeapWriter to support read and pruning operations.

Add time_cut_off field to Limits struct to enable time-based
filtering. Implement list_buckets() method on HeapReader with
configurable limit (default 1000).

Fix Limits to use Clone instead of Copy since it now contains
non-Copy DateTime field.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <[email protected]>
@rescrv rescrv changed the base branch from main to rescrv/heap-time-limit October 16, 2025 21:25
@rescrv rescrv force-pushed the rescrv/heap-endpoints branch from 49a8ece to 89dbc51 Compare October 16, 2025 21:29
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Oct 16, 2025

Add gRPC APIs and Structural Enhancements for Heap Operations

This PR introduces comprehensive gRPC endpoints to manage heap operations in the s3heap service layer. It overhauls the gRPC protobuf (heapservice.proto), implements new server methods on the Rust backend, extends key heap structures, and adds robust type conversions and unit tests for all gRPC message translations. Several internal and test-related changes were made to facilitate these endpoints, including deeper proto/type error handling and interface adjustments across heap reading, writing, and pruning subsystems.

Key Changes

• Expanded heapservice.proto with definitions for Triggerable, Schedule, HeapItem, Limits, PruneStats, FilterCriteria, and all corresponding request/response messages for Push, Peek, Prune, PruneBucket, ListBuckets, and Summary operations.
• Fully implemented new gRPC endpoints in HeapTenderService (push, peek, prune, prune_bucket, list_buckets, summary), each with error mapping, input validation, and async logic leveraging the extended heap backend.
• Added conversion utilities between protobuf types and s3heap domain types with detailed error handling (including UUID and timestamp parsing).
• Introduced a time_cut_off field to the Limits struct, and made Limits Clone (not Copy), supporting filtering and safe heap traversal.
• Added a list_buckets() method on HeapReader for determining heap structure with an optional limit parameter.
• Updated HeapTender and related test harnesses to include HeapReader and HeapPruner fields for complete CRUD lifecycle in heap management.
• Expanded both backend and tests to validate all new conversions, limits, and operational flows, and updated the integration/unit tests for the changed interfaces.
• Added prost-types as an explicit dependency for timestamp handling.

Affected Areas

rust/s3heap-service/src/lib.rs
idl/chromadb/proto/heapservice.proto
rust/s3heap/src/lib.rs
rust/s3heap/tests/test_unit_tests.rs
rust/s3heap-service/tests/test_k8s_integration_00_heap_tender.rs
Cargo.lock
rust/s3heap-service/Cargo.toml
rust/worker/src/compactor/tasks.rs

This summary was automatically generated by @propel-code-bot

Comment on lines +580 to +584
let limits: s3heap::Limits = request
.into_inner()
.limits
.ok_or_else(|| Status::invalid_argument("missing limits"))
.and_then(|l| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Integer overflow in type conversion: timestamp.nanos.try_into().unwrap_or(0) could cause data corruption if nanos is negative (which is valid for timestamps before 1970). The try_into() will fail for negative i32 values when converting to u32, silently defaulting to 0 nanoseconds.

// Fix: Handle negative nanos properly
let nanos = if timestamp.nanos < 0 {
    0u32
} else {
    timestamp.nanos as u32
};
let bucket = DateTime::from_timestamp(timestamp.seconds, nanos)
Context for Agents
[**CriticalError**]

Integer overflow in type conversion: `timestamp.nanos.try_into().unwrap_or(0)` could cause data corruption if `nanos` is negative (which is valid for timestamps before 1970). The `try_into()` will fail for negative i32 values when converting to u32, silently defaulting to 0 nanoseconds.

```rust
// Fix: Handle negative nanos properly
let nanos = if timestamp.nanos < 0 {
    0u32
} else {
    timestamp.nanos as u32
};
let bucket = DateTime::from_timestamp(timestamp.seconds, nanos)
```

File: rust/s3heap-service/src/lib.rs
Line: 584

Comment on lines +143 to +150
pub fn prune_stats_to_proto(stats: PruneStats) -> chroma_proto::PruneStats {
chroma_proto::PruneStats {
items_pruned: stats.items_pruned as u32,
items_retained: stats.items_retained as u32,
buckets_deleted: stats.buckets_deleted as u32,
buckets_updated: stats.buckets_updated as u32,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

The as u32 casts for PruneStats fields can cause silent truncation on 64-bit systems if the number of items exceeds u32::MAX. This could lead to clients receiving incorrect statistics without any error. To prevent this data loss, it's safer to use try_into() and return a Result.

Suggested Change
Suggested change
pub fn prune_stats_to_proto(stats: PruneStats) -> chroma_proto::PruneStats {
chroma_proto::PruneStats {
items_pruned: stats.items_pruned as u32,
items_retained: stats.items_retained as u32,
buckets_deleted: stats.buckets_deleted as u32,
buckets_updated: stats.buckets_updated as u32,
}
}
pub fn prune_stats_to_proto(stats: PruneStats) -> Result<chroma_proto::PruneStats, ConversionError> {
Ok(chroma_proto::PruneStats {
items_pruned: stats.items_pruned.try_into().map_err(|e| ConversionError(format!("items_pruned overflow: {}", e)))?,
items_retained: stats.items_retained.try_into().map_err(|e| ConversionError(format!("items_retained overflow: {}", e)))?,
buckets_deleted: stats.buckets_deleted.try_into().map_err(|e| ConversionError(format!("buckets_deleted overflow: {}", e)))?,
buckets_updated: stats.buckets_updated.try_into().map_err(|e| ConversionError(format!("buckets_updated overflow: {}", e)))?,
})
}

You will also need to update the call sites in prune and prune_bucket to handle the new Result type.

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

The `as u32` casts for `PruneStats` fields can cause silent truncation on 64-bit systems if the number of items exceeds `u32::MAX`. This could lead to clients receiving incorrect statistics without any error. To prevent this data loss, it's safer to use `try_into()` and return a `Result`.

<details>
<summary>Suggested Change</summary>

```suggestion
    pub fn prune_stats_to_proto(stats: PruneStats) -> Result<chroma_proto::PruneStats, ConversionError> {
        Ok(chroma_proto::PruneStats {
            items_pruned: stats.items_pruned.try_into().map_err(|e| ConversionError(format!("items_pruned overflow: {}", e)))?,
            items_retained: stats.items_retained.try_into().map_err(|e| ConversionError(format!("items_retained overflow: {}", e)))?,
            buckets_deleted: stats.buckets_deleted.try_into().map_err(|e| ConversionError(format!("buckets_deleted overflow: {}", e)))?,
            buckets_updated: stats.buckets_updated.try_into().map_err(|e| ConversionError(format!("buckets_updated overflow: {}", e)))?,
        })
    }
```

You will also need to update the call sites in `prune` and `prune_bucket` to handle the new `Result` type.

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

</details>

File: rust/s3heap-service/src/lib.rs
Line: 150

@tanujnay112 tanujnay112 merged commit d50c8ee into rescrv/heap-time-limit Oct 21, 2025
115 of 117 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants