Skip to content

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented Oct 18, 2025

Description of changes

Retries on all GET requests and 429s.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

codetheweb commented Oct 18, 2025

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb force-pushed the feat-chroma-rust-client-metrics branch from 8d165c0 to f3c8a82 Compare October 18, 2025 02:47
@codetheweb codetheweb force-pushed the feat-chroma-rust-client-retries branch 3 times, most recently from 78b6eb1 to 263fa95 Compare October 18, 2025 02:50
@codetheweb codetheweb marked this pull request as ready for review October 18, 2025 02:50
Copy link
Contributor

propel-code-bot bot commented Oct 18, 2025

Introduce Retry Logic for Rust Client GET Requests and 429s

This pull request adds configurable retry logic to the Rust ChromaClient. It implements exponential backoff retries for all GET requests and for any HTTP requests that receive a 429 Too Many Requests response. The retry policy is user-adjustable via the new ChromaRetryOptions field in ChromaClientOptions. Non-GET requests will only be retried on 429 responses, not on 5xx or other errors, preserving idempotency. Tests using httpmock are included to verify both positive and negative retry scenarios.

Key Changes

• Introduced ChromaRetryOptions to ChromaClientOptions enabling user configuration of retry behavior
• Integrated backon crate's ExponentialBuilder for exponential backoff policies in chroma_client.rs
• Updated ChromaClient to retry all GET requests and non-GET requests on 429 responses, with detailed retry notification via tracing
• Wire up retry metrics increment in metrics.rs and extend metrics to track retries
• Added explicit tests to ensure: (a) GET requests are retried on errors, (b) non-GET requests are retried on 429s, (c) retry count and result correctness
• Updated Cargo.toml and Cargo.lock with httpmock and related dependencies; minor tokio and smallvec upgrades

Affected Areas

rust/chroma/src/client/chroma_client.rs
rust/chroma/src/client/options.rs
rust/chroma/src/client/metrics.rs
rust/chroma/Cargo.toml
Cargo.lock

This summary was automatically generated by @propel-code-bot

}

#[cfg(test)]
mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[TestCoverage]

This is a great addition! The retry logic looks solid and the tests for the success cases are well-written.

To make the test suite even more robust, consider adding a test case to verify that non-idempotent methods (like POST) are not retried on server errors (like 500), as per your retry logic. This ensures the retry mechanism isn't overly aggressive and follows HTTP idempotency best practices.

The suggested test follows common retry library testing patterns seen in crates like tokio-retry and backoff, which emphasize testing both positive and negative retry scenarios. This is particularly important for HTTP clients where retry behavior should respect HTTP method semantics:

    #[tokio::test]
    #[test_log::test]
    async fn test_does_not_retry_non_get_on_500() {
        // Test implementation...
        assert_eq!(mock.calls(), 1); // Ensures only one attempt, no retries
    }
Context for Agents
[**TestCoverage**]

This is a great addition! The retry logic looks solid and the tests for the success cases are well-written.

To make the test suite even more robust, consider adding a test case to verify that non-idempotent methods (like POST) are *not* retried on server errors (like 500), as per your retry logic. This ensures the retry mechanism isn't overly aggressive and follows HTTP idempotency best practices.

The suggested test follows common retry library testing patterns seen in crates like `tokio-retry` and `backoff`, which emphasize testing both positive and negative retry scenarios. This is particularly important for HTTP clients where retry behavior should respect HTTP method semantics:

```rust
    #[tokio::test]
    #[test_log::test]
    async fn test_does_not_retry_non_get_on_500() {
        // Test implementation...
        assert_eq!(mock.calls(), 1); // Ensures only one attempt, no retries
    }
```

File: rust/chroma/src/client/chroma_client.rs
Line: 286

@codetheweb codetheweb mentioned this pull request Oct 18, 2025
1 task
@codetheweb codetheweb force-pushed the feat-chroma-rust-client-retries branch from 263fa95 to 251cad6 Compare October 18, 2025 03:07
@rescrv rescrv self-requested a review October 18, 2025 04:17
};

let response = attempt
.retry(&self.retry_policy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the caller opt out of this? It's dangerous and incorrect for a client to retry non-reads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry do you mean a user or something inside the crate calling send? Users can set max_retries to 0. However this specifically only retries GET requests and 429s, both of which are always safe to retry in our current system.

@codetheweb codetheweb force-pushed the feat-chroma-rust-client-metrics branch from f3c8a82 to bd81ebe Compare October 19, 2025 17:29
@codetheweb codetheweb force-pushed the feat-chroma-rust-client-retries branch from 251cad6 to 7cde601 Compare October 19, 2025 17:29
@codetheweb codetheweb force-pushed the feat-chroma-rust-client-metrics branch 2 times, most recently from ef08e67 to 16aa07e Compare October 19, 2025 17:30
@codetheweb codetheweb force-pushed the feat-chroma-rust-client-retries branch from 7cde601 to d994a3e Compare October 19, 2025 17:32
@codetheweb codetheweb force-pushed the feat-chroma-rust-client-metrics branch from 16aa07e to 37666f3 Compare October 19, 2025 17:32
@codetheweb codetheweb changed the base branch from feat-chroma-rust-client-metrics to graphite-base/5641 October 19, 2025 18:03
@codetheweb codetheweb force-pushed the feat-chroma-rust-client-retries branch from d994a3e to 4b08584 Compare October 19, 2025 18:04
@graphite-app graphite-app bot changed the base branch from graphite-base/5641 to main October 19, 2025 18:04
Copy link

graphite-app bot commented Oct 19, 2025

Merge activity

  • Oct 19, 6:04 PM UTC: Graphite rebased this pull request, because this pull request is set to merge when ready.
  • Oct 19, 6:31 PM UTC: @codetheweb merged this pull request with Graphite.

Comment on lines +264 to +269
.when(|err| {
err.status()
.map(|status| status == StatusCode::TOO_MANY_REQUESTS)
.unwrap_or_default()
|| method == Method::GET
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

The current retry logic for GET requests is quite broad. It will retry on any error, including 4xx client errors like 404 Not Found or 401 Unauthorized, which are typically not transient and are unlikely to succeed on a retry.

To make the client more robust, it would be better to limit retries for GET requests to:

  • Network errors (connection failures, timeouts)
  • 5xx server errors (500, 502, 503, 504) which indicate transient server issues
  • Specific 4xx codes that may be transient: 408 (Request Timeout), 429 (Too Many Requests)

This avoids wasting time and resources on retries for unrecoverable client-side errors like 401 (Unauthorized), 403 (Forbidden), 404 (Not Found), etc.

Here's a suggested change to make the retry condition more specific:

Context for Agents
[**BestPractice**]

The current retry logic for GET requests is quite broad. It will retry on any error, including 4xx client errors like 404 Not Found or 401 Unauthorized, which are typically not transient and are unlikely to succeed on a retry.

To make the client more robust, it would be better to limit retries for GET requests to:
- Network errors (connection failures, timeouts)
- 5xx server errors (500, 502, 503, 504) which indicate transient server issues
- Specific 4xx codes that may be transient: 408 (Request Timeout), 429 (Too Many Requests)

This avoids wasting time and resources on retries for unrecoverable client-side errors like 401 (Unauthorized), 403 (Forbidden), 404 (Not Found), etc.

Here's a suggested change to make the retry condition more specific:

File: rust/chroma/src/client/chroma_client.rs
Line: 269

@codetheweb codetheweb merged commit 18c5938 into main Oct 19, 2025
70 of 75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants