-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[ENH]: add retries to Rust client #5641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
8d165c0
to
f3c8a82
Compare
78b6eb1
to
263fa95
Compare
Introduce Retry Logic for Rust Client GET Requests and 429s This pull request adds configurable retry logic to the Rust Key Changes• Introduced Affected Areas• This summary was automatically generated by @propel-code-bot |
} | ||
|
||
#[cfg(test)] | ||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[TestCoverage]
This is a great addition! The retry logic looks solid and the tests for the success cases are well-written.
To make the test suite even more robust, consider adding a test case to verify that non-idempotent methods (like POST) are not retried on server errors (like 500), as per your retry logic. This ensures the retry mechanism isn't overly aggressive and follows HTTP idempotency best practices.
The suggested test follows common retry library testing patterns seen in crates like tokio-retry
and backoff
, which emphasize testing both positive and negative retry scenarios. This is particularly important for HTTP clients where retry behavior should respect HTTP method semantics:
#[tokio::test]
#[test_log::test]
async fn test_does_not_retry_non_get_on_500() {
// Test implementation...
assert_eq!(mock.calls(), 1); // Ensures only one attempt, no retries
}
Context for Agents
[**TestCoverage**]
This is a great addition! The retry logic looks solid and the tests for the success cases are well-written.
To make the test suite even more robust, consider adding a test case to verify that non-idempotent methods (like POST) are *not* retried on server errors (like 500), as per your retry logic. This ensures the retry mechanism isn't overly aggressive and follows HTTP idempotency best practices.
The suggested test follows common retry library testing patterns seen in crates like `tokio-retry` and `backoff`, which emphasize testing both positive and negative retry scenarios. This is particularly important for HTTP clients where retry behavior should respect HTTP method semantics:
```rust
#[tokio::test]
#[test_log::test]
async fn test_does_not_retry_non_get_on_500() {
// Test implementation...
assert_eq!(mock.calls(), 1); // Ensures only one attempt, no retries
}
```
File: rust/chroma/src/client/chroma_client.rs
Line: 286
263fa95
to
251cad6
Compare
}; | ||
|
||
let response = attempt | ||
.retry(&self.retry_policy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the caller opt out of this? It's dangerous and incorrect for a client to retry non-reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry do you mean a user or something inside the crate calling send? Users can set max_retries
to 0. However this specifically only retries GET requests and 429s, both of which are always safe to retry in our current system.
f3c8a82
to
bd81ebe
Compare
251cad6
to
7cde601
Compare
ef08e67
to
16aa07e
Compare
7cde601
to
d994a3e
Compare
16aa07e
to
37666f3
Compare
37666f3
to
f0232ca
Compare
d994a3e
to
4b08584
Compare
Merge activity
|
.when(|err| { | ||
err.status() | ||
.map(|status| status == StatusCode::TOO_MANY_REQUESTS) | ||
.unwrap_or_default() | ||
|| method == Method::GET | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
The current retry logic for GET requests is quite broad. It will retry on any error, including 4xx client errors like 404 Not Found or 401 Unauthorized, which are typically not transient and are unlikely to succeed on a retry.
To make the client more robust, it would be better to limit retries for GET requests to:
- Network errors (connection failures, timeouts)
- 5xx server errors (500, 502, 503, 504) which indicate transient server issues
- Specific 4xx codes that may be transient: 408 (Request Timeout), 429 (Too Many Requests)
This avoids wasting time and resources on retries for unrecoverable client-side errors like 401 (Unauthorized), 403 (Forbidden), 404 (Not Found), etc.
Here's a suggested change to make the retry condition more specific:
Context for Agents
[**BestPractice**]
The current retry logic for GET requests is quite broad. It will retry on any error, including 4xx client errors like 404 Not Found or 401 Unauthorized, which are typically not transient and are unlikely to succeed on a retry.
To make the client more robust, it would be better to limit retries for GET requests to:
- Network errors (connection failures, timeouts)
- 5xx server errors (500, 502, 503, 504) which indicate transient server issues
- Specific 4xx codes that may be transient: 408 (Request Timeout), 429 (Too Many Requests)
This avoids wasting time and resources on retries for unrecoverable client-side errors like 401 (Unauthorized), 403 (Forbidden), 404 (Not Found), etc.
Here's a suggested change to make the retry condition more specific:
File: rust/chroma/src/client/chroma_client.rs
Line: 269
Description of changes
Retries on all GET requests and 429s.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?