Skip to content

feat: improved retry handling #186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

feat: improved retry handling #186

wants to merge 27 commits into from

Conversation

jimmyjames
Copy link
Contributor

@jimmyjames jimmyjames commented Jul 24, 2025

Retry-After header support with enhanced retry strategy

Summary

This PR implements comprehensive RFC 9110 compliant retry behavior with support for the Retry-After header, enhanced exponential backoff with jitter, network error recovery, and improved configuration validation with strict input validation.

Fixes #155

Key Features

🔄 RFC 9110 Compliant Retry-After Header Support

  • Parses both integer seconds and HTTP-date formats (RFC 1123)
  • Validates retry delays between 1-1800 seconds (30 minutes maximum)
  • Prioritizes server-specified delays over exponential backoff
  • Gracefully handles invalid header formats by falling back to exponential backoff

📈 Enhanced Exponential Backoff with Jitter

  • Formula: 2^retryCount * minimumRetryDelay with random jitter
  • Prevents thundering herd problems
  • Maximum delay capped at 120 seconds
  • Default minimum retry delay: 100ms

🛡️ Unified Retry Strategy

  • Rate limiting (429): Always retried regardless of HTTP method
  • Server errors (5xx): Retried for ALL operations (except 501 Not Implemented)
    • Honors Retry-After header when present
    • Falls back to exponential backoff when Retry-After not available
  • Network errors: Comprehensive retry support for connection timeouts, DNS failures, and network connectivity issues

🔧 Configuration Enhancements

  • Maximum allowable retries enforced at 15 (with validation)
  • Default retries remain 3 for backward compatibility
  • Strict input validation for minimumRetryDelay configuration
  • Retry-After header value exposed in FgaError objects

Breaking Changes ⚠️

1. Strict Configuration Validation

Previous: minimumRetryDelay accepted null values and negative durations
New: Throws IllegalArgumentException for null or negative values

// Before: These would work (with runtime fallbacks)
var config = new Configuration().minimumRetryDelay(null);
var config = new Configuration().minimumRetryDelay(Duration.ofMillis(-100));

// After: These throw IllegalArgumentException  
var config = new Configuration().minimumRetryDelay(null);        // ❌ IllegalArgumentException
var config = new Configuration().minimumRetryDelay(Duration.ofMillis(-100)); // ❌ IllegalArgumentException

// Correct usage:
var config = new Configuration().minimumRetryDelay(Duration.ofMillis(100)); // ✅ Valid

2. Maximum Retry Validation

Previous: No validation on maxRetries value
New: Throws IllegalArgumentException if maxRetries > 15

// Before: This would work (dangerously)
var config = new Configuration().maxRetries(1000);

// After: This throws IllegalArgumentException
var config = new Configuration().maxRetries(1000); // ❌ IllegalArgumentException

// Correct usage:
var config = new Configuration().maxRetries(15); // ✅ Maximum allowed

3. Enhanced Error Information

New: FgaError now exposes Retry-After header value

try {
    var response = client.check(request).get();
} catch (ExecutionException e) {
    if (e.getCause() instanceof FgaError) {
        FgaError error = (FgaError) e.getCause();
        String retryAfter = error.getRetryAfterHeader(); // New method
        if (retryAfter != null) {
            System.out.println("Server requested retry after: " + retryAfter);
        }
    }
}

Example Usage

Basic Configuration

var config = new ClientConfiguration()
    .maxRetries(3)                           // Default: 3, Maximum: 15
    .minimumRetryDelay(Duration.ofMillis(100)); // Default: 100ms, must be non-null and positive

var client = new OpenFgaClient(config);

Retry Behavior Examples

Scenario 1: Rate Limiting (429)

// Server responds with: 429 Too Many Requests, Retry-After: 5
// SDK behavior: Waits 5 seconds, then retries (up to 3 times)
client.check(request).get();

Scenario 2: Server Error with Retry-After Header

// Server responds with: 503 Service Unavailable, Retry-After: 30
// SDK behavior: Waits 30 seconds, then retries
client.writeAuthorizationModel(model).get();

Scenario 3: Server Error without Retry-After Header

// Server responds with: 500 Internal Server Error (no Retry-After header)
// SDK behavior: Uses exponential backoff (100ms, 200ms, 400ms) and retries
client.readAuthorizationModels().get();

Scenario 4: Network Error Recovery

// Network connectivity issues (connection timeout, DNS failure, etc.)
// SDK behavior: Uses exponential backoff with jitter, retries up to configured maximum
client.listStores().get();

Scenario 5: Exponential Backoff Sequence

// Without Retry-After header, delays follow: 100ms → 200ms → 400ms → 800ms → 1600ms → 3200ms...
// With jitter, actual delays vary between base and 2x base delay
// Maximum delay capped at 120 seconds
client.listObjects(request).get();

Implementation Details

New Utility Classes

  • RetryAfterHeaderParser: RFC 9110 compliant header parsing for both seconds and HTTP-date formats
  • ExponentialBackoff: Exponential backoff calculation with jitter and maximum delay cap
  • RetryStrategy: Centralized retry decision logic for different error types

Enhanced Classes

  • HttpRequestAttempt: Updated retry logic with network error handling and proper minimum delay enforcement
  • Configuration: Added strict validation for minimumRetryDelay and maxRetries parameters
  • FgaError: Exposes Retry-After header value for improved observability

Comprehensive Test Coverage

  • 18+ retry configuration tests covering global and per-request scenarios
  • Unit tests for all retry utilities (RetryAfterHeaderParser, ExponentialBackoff, RetryStrategy)
  • Integration tests with mock Retry-After responses and network error simulation
  • Edge case validation and error handling tests
  • RFC 9110 compliance verification

Migration Guide

  1. Update configuration validation: Ensure minimumRetryDelay values are non-null and positive

    // Replace this:
    config.minimumRetryDelay(null);
    
    // With this:
    config.minimumRetryDelay(Duration.ofMillis(100)); // or your preferred positive value
  2. Validate retry configuration: Ensure maxRetries values are ≤ 15

    // Replace this:
    config.maxRetries(50);
    
    // With this:
    config.maxRetries(15); // maximum allowed value
  3. Update error handling (optional): Use new getRetryAfterHeader() method for custom retry logic

    if (error instanceof FgaError) {
        String retryAfter = ((FgaError) error).getRetryAfterHeader();
        // Handle server-specified retry delays
    }

Backward Compatibility

  • Default retry count remains 3 (no change for existing code)
  • All HTTP operations now benefit from unified retry behavior
  • All existing configuration options preserved (with enhanced validation)
  • No changes to public API surface except new getRetryAfterHeader() method
  • Existing code using valid configuration values continues to work unchanged

Performance and Reliability Improvements

  • Network resilience: Automatic recovery from temporary network issues
  • Server load reduction: Honors server-specified retry delays via Retry-After header
  • Thundering herd prevention: Jitter in exponential backoff prevents synchronized retries
  • Fail-fast validation: Configuration errors caught at setup time, not runtime
  • Comprehensive error handling: Better visibility into retry behavior and server responses

Testing

All existing tests pass (348+ tests) plus comprehensive new test coverage:

New Test Coverage

  • Configuration validation tests: Verify strict validation for minimumRetryDelay and maxRetries
  • Retry strategy tests: Comprehensive coverage of retry decision logic
  • Network error tests: Validation of retry behavior during network connectivity issues
  • Retry-After header tests: RFC 9110 compliance for both seconds and HTTP-date formats
  • Integration tests: End-to-end retry behavior with real HTTP mocking

Test Categories

  • Unit tests for retry utilities (100% coverage)
  • Integration tests for retry configuration scenarios
  • Edge case validation and error condition testing
  • Performance tests for exponential backoff timing
  • RFC 9110 compliance verification

References

Copy link

coderabbitai bot commented Jul 24, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update overhauls the retry strategy in the Java SDK, introducing RFC 9110-compliant parsing of the Retry-After header, exponential backoff with jitter, and differentiated retry logic for state-affecting and non-state-affecting operations. The maximum retries are capped at 15, and error handling now exposes the Retry-After value. Comprehensive tests and documentation have been added and updated.

Changes

Cohort / File(s) Change Summary
Changelog & Documentation
CHANGELOG.md, README.md
Expanded and clarified documentation and changelog to reflect new retry logic, configuration, error handling, and breaking changes. Updated code examples and migration guidance.
Core Retry Logic & Utilities
src/main/java/dev/openfga/sdk/api/client/HttpRequestAttempt.java,
src/main/java/dev/openfga/sdk/util/ExponentialBackoff.java,
src/main/java/dev/openfga/sdk/util/RetryStrategy.java,
src/main/java/dev/openfga/sdk/util/RetryAfterHeaderParser.java
Refactored HTTP request retry logic to use new utilities for exponential backoff with jitter and Retry-After parsing. Introduced utility classes for backoff calculation, Retry-After header parsing, and retry decision logic based on HTTP method and status code.
Configuration
src/main/java/dev/openfga/sdk/api/configuration/Configuration.java
Added and validated maxRetries configuration with enforced limits. Updated default and max values.
Error Handling
src/main/java/dev/openfga/sdk/errors/FgaError.java
Added storage and access for the Retry-After header value in errors. Modified error construction to extract and store this header.
Test Updates for Retry Behavior
src/test/java/dev/openfga/sdk/api/OpenFgaApiTest.java,
src/test/java/dev/openfga/sdk/api/client/OpenFgaClientTest.java,
src/test/java/dev/openfga/sdk/api/auth/OAuth2ClientTest.java
Updated tests to reflect new retry behavior: state-affecting operations (POST, PUT, PATCH, DELETE) no longer retry on 5xx errors unless a Retry-After header is present. Adjusted expectations and comments accordingly.
New/Expanded Tests for Retry Logic
src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java,
src/test/java/dev/openfga/sdk/util/ExponentialBackoffTest.java,
src/test/java/dev/openfga/sdk/util/RetryAfterHeaderParserTest.java,
src/test/java/dev/openfga/sdk/util/RetryStrategyTest.java
Added comprehensive new tests for retry logic, backoff calculation, Retry-After header parsing, and retry strategy. Covered edge cases and ensured correct behavior for all supported scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HttpRequestAttempt
    participant RetryStrategy
    participant ExponentialBackoff
    participant RetryAfterHeaderParser
    participant Server

    Client->>HttpRequestAttempt: Send HTTP request
    loop up to maxRetries
        HttpRequestAttempt->>Server: Perform HTTP request
        Server-->>HttpRequestAttempt: HTTP response (status, headers)
        alt Retry-After header present
            HttpRequestAttempt->>RetryAfterHeaderParser: Parse Retry-After
            RetryAfterHeaderParser-->>HttpRequestAttempt: Optional<Duration> delay
        end
        HttpRequestAttempt->>RetryStrategy: shouldRetry(request, status, hasRetryAfter)
        RetryStrategy-->>HttpRequestAttempt: true/false
        alt Should retry
            HttpRequestAttempt->>RetryStrategy: calculateRetryDelay(retryAfter, retryCount)
            RetryStrategy->>ExponentialBackoff: calculateDelay(retryCount)
            ExponentialBackoff-->>RetryStrategy: Duration (with jitter)
            RetryStrategy-->>HttpRequestAttempt: Duration (delay)
            HttpRequestAttempt->>HttpRequestAttempt: Wait for delay, increment retryCount
        else Exit loop
            HttpRequestAttempt-->>Client: Return response or throw error
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Assessment against linked issues

Objective Addressed Explanation
Honor Retry-After header on 429s and 5xxs for state-affecting ops; fallback to exponential backoff (155)
Expose Retry-After header value in error when received (155)
Fallback to exponential backoff with jitter when Retry-After not available (155)
Max allowable retries is 15, default is 3 (155)
Retry logic: state-affecting ops only retry 5xx with Retry-After; others always retry 5xx/429 (155)

Suggested labels

codex

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/retry-after

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov-commenter
Copy link

codecov-commenter commented Jul 24, 2025

Codecov Report

❌ Patch coverage is 86.23853% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.55%. Comparing base (ff7aa69) to head (da8807e).

Files with missing lines Patch % Lines
...dev/openfga/sdk/api/client/HttpRequestAttempt.java 80.95% 6 Missing and 2 partials ⚠️
...v/openfga/sdk/api/configuration/Configuration.java 63.63% 2 Missing and 2 partials ⚠️
.../main/java/dev/openfga/sdk/util/RetryStrategy.java 85.71% 0 Missing and 2 partials ⚠️
src/main/java/dev/openfga/sdk/errors/FgaError.java 85.71% 1 Missing ⚠️

❌ Your project status has failed because the head coverage (34.55%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #186      +/-   ##
============================================
+ Coverage     33.73%   34.55%   +0.81%     
- Complexity     1005     1041      +36     
============================================
  Files           182      185       +3     
  Lines          6900     6983      +83     
  Branches        778      789      +11     
============================================
+ Hits           2328     2413      +85     
+ Misses         4467     4464       -3     
- Partials        105      106       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jimmyjames
Copy link
Contributor Author

@coderabbitai review

Copy link

coderabbitai bot commented Jul 28, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/main/java/dev/openfga/sdk/util/ExponentialBackoff.java (1)

8-11: Remove misleading auto-generated comment.

Same issue as the previous file - this is clearly hand-written code, not auto-generated.

src/test/java/dev/openfga/sdk/util/RetryAfterHeaderParserTest.java (1)

8-11: Remove misleading auto-generated comment.

Same issue as other files - this is hand-written test code, not auto-generated.

🧹 Nitpick comments (3)
src/main/java/dev/openfga/sdk/util/RetryAfterHeaderParser.java (1)

8-11: Remove misleading auto-generated comment.

The header comment states this class is auto-generated and should not be edited manually, but this is clearly a manually written utility class that will require maintenance and updates.

- * NOTE: This class is auto generated by OpenAPI Generator (https://openapi-generator.tech).
- * https://openapi-generator.tech
- * Do not edit the class manually.
src/main/java/dev/openfga/sdk/util/ExponentialBackoff.java (1)

75-98: Consider refactoring to eliminate code duplication.

This method duplicates the entire logic from the first calculateDelay method. Consider extracting the common logic to reduce maintenance burden.

 public static Duration calculateDelay(int retryCount) {
-    if (retryCount < 0) {
-        return Duration.ZERO;
-    }
-
-    // Calculate base delay: 2^retryCount * 100ms
-    long baseDelayMs = (long) Math.pow(2, retryCount) * BASE_DELAY_MS;
-
-    // Cap at maximum delay
-    long maxDelayMs = MAX_DELAY_SECONDS * 1000L;
-    if (baseDelayMs > maxDelayMs) {
-        baseDelayMs = maxDelayMs;
-    }
-
-    // Add jitter: random value between baseDelay and 2 * baseDelay
-    long minDelayMs = baseDelayMs;
-    long maxDelayMsWithJitter = Math.min(baseDelayMs * 2, maxDelayMs);
-
-    // Generate random delay within the jitter range
-    long jitterRange = maxDelayMsWithJitter - minDelayMs;
-    long actualDelayMs = minDelayMs + (jitterRange > 0 ? (long) (RANDOM.nextDouble() * (jitterRange + 1)) : 0);
-
-    return Duration.ofMillis(actualDelayMs);
+    return calculateDelay(retryCount, RANDOM);
 }
src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java (1)

30-319: Consider adding tests for additional retry scenarios.

The current test suite provides excellent coverage of the core retry logic. Consider adding tests for these scenarios to achieve complete coverage:

  1. HTTP-date format Retry-After headers - The PR mentions RFC 9110 compliance which includes HTTP-date format support
  2. Other state-affecting methods - Test PUT, PATCH, DELETE to ensure they follow the same retry rules as POST
  3. Retry-After boundary values - Test edge cases like 1 second, 1800 seconds (30 minutes max), and out-of-range values
  4. FgaError Retry-After exposure - Verify that the Retry-After header value is properly exposed in FgaError objects as mentioned in the PR objectives

Example test for HTTP-date format:

@Test
void shouldHandleHttpDateRetryAfterHeader() throws Exception {
    // Test with HTTP-date format: "Fri, 31 Dec 1999 23:59:59 GMT"
    String httpDate = DateTimeFormatter.RFC_1123_DATE_TIME
        .format(ZonedDateTime.now().plusSeconds(2));
    
    wireMockServer.stubFor(get(urlEqualTo("/test"))
        .willReturn(aResponse()
            .withStatus(429)
            .withHeader("Retry-After", httpDate)
            .withBody("{\"error\":\"rate limited\"}")));
    // ... rest of test
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59ee8dc and 9a07cb3.

📒 Files selected for processing (15)
  • CHANGELOG.md (1 hunks)
  • README.md (4 hunks)
  • src/main/java/dev/openfga/sdk/api/client/HttpRequestAttempt.java (3 hunks)
  • src/main/java/dev/openfga/sdk/api/configuration/Configuration.java (3 hunks)
  • src/main/java/dev/openfga/sdk/errors/FgaError.java (3 hunks)
  • src/main/java/dev/openfga/sdk/util/ExponentialBackoff.java (1 hunks)
  • src/main/java/dev/openfga/sdk/util/RetryAfterHeaderParser.java (1 hunks)
  • src/main/java/dev/openfga/sdk/util/RetryStrategy.java (1 hunks)
  • src/test/java/dev/openfga/sdk/api/OpenFgaApiTest.java (9 hunks)
  • src/test/java/dev/openfga/sdk/api/auth/OAuth2ClientTest.java (1 hunks)
  • src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java (1 hunks)
  • src/test/java/dev/openfga/sdk/api/client/OpenFgaClientTest.java (12 hunks)
  • src/test/java/dev/openfga/sdk/util/ExponentialBackoffTest.java (1 hunks)
  • src/test/java/dev/openfga/sdk/util/RetryAfterHeaderParserTest.java (1 hunks)
  • src/test/java/dev/openfga/sdk/util/RetryStrategyTest.java (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/test/java/dev/openfga/sdk/util/RetryAfterHeaderParserTest.java (1)
src/main/java/dev/openfga/sdk/util/RetryAfterHeaderParser.java (1)
  • RetryAfterHeaderParser (30-104)
src/main/java/dev/openfga/sdk/util/RetryStrategy.java (2)
src/main/java/dev/openfga/sdk/errors/HttpStatusCode.java (1)
  • HttpStatusCode (3-38)
src/main/java/dev/openfga/sdk/util/ExponentialBackoff.java (1)
  • ExponentialBackoff (26-99)
src/test/java/dev/openfga/sdk/util/ExponentialBackoffTest.java (1)
src/main/java/dev/openfga/sdk/util/ExponentialBackoff.java (1)
  • ExponentialBackoff (26-99)
🔇 Additional comments (57)
src/test/java/dev/openfga/sdk/api/auth/OAuth2ClientTest.java (1)

143-148: LGTM - Test correctly reflects new retry strategy.

The addition of the Retry-After header to the 500 response correctly aligns with the new retry behavior where POST requests only retry on 5xx errors when this header is present. The comment clearly documents this breaking change.

src/main/java/dev/openfga/sdk/errors/FgaError.java (3)

20-20: LGTM - Clean field addition.

The new retryAfterHeader field follows the established pattern of other fields in the class.


64-69: LGTM - Proper header extraction logic.

The implementation correctly uses Optional to safely extract the Retry-After header and only sets it when present. This avoids null pointer issues and follows good Java practices.


137-143: LGTM - Standard getter/setter implementation.

The getter and setter methods follow standard Java conventions and enable client code to access the server-suggested retry delay as mentioned in the PR objectives.

src/test/java/dev/openfga/sdk/api/OpenFgaApiTest.java (9)

270-271: LGTM - Correctly implements new retry behavior for POST requests.

The test correctly verifies that POST requests no longer retry on 5xx errors without a Retry-After header, as documented in the breaking change comment.


452-453: LGTM - Correctly implements new retry behavior for DELETE requests.

The test correctly verifies that DELETE requests no longer retry on 5xx errors without a Retry-After header, consistent with the new retry strategy for state-affecting operations.


675-676: LGTM - Correctly implements new retry behavior for POST requests.

The test correctly verifies that POST requests no longer retry on 5xx errors without a Retry-After header, maintaining consistency with other state-affecting operations.


1085-1086: LGTM - Correctly implements new retry behavior for POST requests.

The test correctly verifies that POST requests no longer retry on 5xx errors without a Retry-After header, consistent with the new retry strategy.


1315-1316: LGTM - Correctly implements new retry behavior for POST requests.

The test correctly verifies that POST requests no longer retry on 5xx errors without a Retry-After header, maintaining consistency across all write operations.


1430-1431: LGTM - Correctly implements new retry behavior for POST requests.

The test correctly verifies that POST requests no longer retry on 5xx errors without a Retry-After header, consistent with the new retry strategy for state-affecting operations.


1557-1558: LGTM - Correctly implements new retry behavior for POST requests.

The test correctly verifies that POST requests no longer retry on 5xx errors without a Retry-After header, maintaining consistency with other write operations.


1672-1673: LGTM - Correctly implements new retry behavior for POST requests.

The test correctly verifies that POST requests no longer retry on 5xx errors without a Retry-After header, consistent with the new retry strategy.


1924-1925: LGTM - Correctly implements new retry behavior for PUT requests.

The test correctly verifies that PUT requests no longer retry on 5xx errors without a Retry-After header, completing the consistent implementation across all state-affecting HTTP methods (POST, DELETE, PUT).

src/test/java/dev/openfga/sdk/api/client/OpenFgaClientTest.java (12)

389-390: LGTM! Test correctly reflects the new retry behavior.

The test has been properly updated to reflect that POST requests no longer retry on 5xx errors without a Retry-After header. The explanatory comment clearly documents this breaking change.


576-577: LGTM! Test correctly reflects the new retry behavior.

The test has been properly updated to reflect that DELETE requests no longer retry on 5xx errors without a Retry-After header. The explanatory comment clearly documents this breaking change.


860-861: LGTM! Test correctly reflects the new retry behavior.

The test has been properly updated to reflect that POST requests no longer retry on 5xx errors without a Retry-After header. The explanatory comment clearly documents this breaking change.


1558-1559: LGTM! Test correctly reflects the new retry behavior.

The test has been properly updated to reflect that POST requests for write operations no longer retry on 5xx errors without a Retry-After header. The explanatory comment clearly documents this breaking change.


1873-1874: Verify if batch check operations should retry on 5xx errors.

The clientBatchCheck operation is also semantically a non-state-affecting operation (batch authorization checking doesn't modify state) despite using the POST HTTP method. This reinforces the pattern that the implementation may be classifying operations by HTTP method rather than semantic meaning.

This is consistent with the previous concerns about read and check operations. The verification script from the previous comment should cover this case as well.


2100-2101: LGTM! Test correctly documents 429 retry behavior.

The comment appropriately clarifies that HTTP 429 (rate limiting) errors continue to retry regardless of the HTTP method, which is not affected by the breaking changes. This serves as a good contrast to the 5xx error behavior changes.


2218-2219: Verify if expand operations should retry on 5xx errors.

The expand operation is semantically a non-state-affecting operation (it expands relationships for analysis/debugging purposes) despite using the POST HTTP method. This follows the same pattern as read/check operations where the implementation may be classifying by HTTP method rather than semantic meaning.

This should be covered by the verification script from the earlier comments about read/check operations.


2325-2326: Verify if listObjects operations should retry on 5xx errors.

The listObjects operation is semantically a non-state-affecting operation (it lists objects based on authorization queries) despite using the POST HTTP method. This continues the pattern of potential misclassification by HTTP method rather than semantic operation type.

This should be covered by the verification script from the earlier comments about read/check operations.


2572-2573: Verify if listRelations operations should retry on 5xx errors.

The listRelations operation is semantically a non-state-affecting operation (it checks which relations a user has with an object) despite using the POST HTTP method. This continues the pattern of potential misclassification by HTTP method rather than semantic operation type.

This should be covered by the verification script from the earlier comments about read/check operations.


2911-2912: LGTM! Test correctly reflects the new retry behavior.

The test has been properly updated to reflect that PUT requests for write assertions no longer retry on 5xx errors without a Retry-After header. This is correct since writeAssertions is genuinely a state-affecting operation that modifies the authorization model's assertions.


1130-1131: I wasn’t able to locate where the retry strategy classifies “read” vs. “write” operations in the codebase—particularly how it treats the POST-based read endpoint. Can you please confirm whether the implementation is:

  • Classifying retries purely by HTTP method (POST → write)
  • Or using semantic operation type (read endpoint → retry on 5xx despite POST)

This will determine if the test change (no longer retrying 5xx on read) aligns with the intended behavior.


1676-1677: Confirm retry behavior for check (POST) operations

The current retry logic in RetryStrategy.shouldRetry() uses isStateAffectingMethod(method) to classify all POST requests (including check calls) as state-affecting, so they only retry on 5xx when a Retry-After header is present. If the check() API is intended to be treated as non-state-affecting (and thus always retry on 5xx), you’ll need to adjust:

  • RetryStrategy.isStateAffectingMethod() in src/main/java/dev/openfga/sdk/util/RetryStrategy.java to exempt check operations
  • Add or update unit tests in RetryStrategyTest.java to cover check (POST) semantics

Please verify whether check() should behave like other non-mutating operations and update the classification logic accordingly.

src/main/java/dev/openfga/sdk/util/RetryAfterHeaderParser.java (3)

45-60: LGTM! Well-structured parsing logic with proper fallback.

The method correctly tries integer parsing first, then falls back to HTTP-date parsing. The null/empty validation and trimming are handled appropriately.


65-78: LGTM! Robust integer parsing with proper validation.

The range validation (1-1800 seconds) correctly implements the 30-minute maximum as specified in the requirements. Exception handling gracefully returns empty Optional for invalid input.


84-103: LGTM! RFC 9110 compliant HTTP-date parsing.

The implementation correctly uses RFC_1123_DATE_TIME formatter and validates the calculated duration within the same 1-1800 second range. The use of Instant.now() for current time calculation is appropriate for retry scenarios.

src/main/java/dev/openfga/sdk/util/ExponentialBackoff.java (1)

42-65: LGTM! Correct exponential backoff implementation with proper jitter.

The mathematical formula correctly implements 2^retryCount * 100ms, with appropriate capping at 120 seconds. The jitter calculation between base and 2*base delay helps prevent thundering herd problems.

CHANGELOG.md (1)

5-26: Comprehensive and well-structured changelog.

The changelog effectively documents all breaking changes, new features, and provides clear migration guidance. The technical details section helps users understand the implementation specifics.

src/test/java/dev/openfga/sdk/util/RetryAfterHeaderParserTest.java (1)

25-214: Excellent comprehensive test coverage.

The test suite thoroughly covers all scenarios:

  • Valid and invalid integer parsing (including boundary values)
  • HTTP-date parsing with proper timing tolerance
  • Edge cases (null, empty, whitespace, non-numeric values)
  • Input sanitization (whitespace trimming)

The use of tolerance in HTTP-date tests (lines 113) properly accounts for execution time variations.

src/main/java/dev/openfga/sdk/api/configuration/Configuration.java (3)

39-40: LGTM! Well-defined retry configuration constants.

The constants clearly establish the default (3) and maximum allowable (15) retry counts, which aligns with the PR objectives and provides clear boundaries for configuration.


57-57: LGTM! Proper initialization in constructor.

The constructor correctly initializes maxRetries to the default value, maintaining consistency with other configuration properties.


271-277: LGTM! Robust validation with clear error messages.

The validation correctly prevents negative values and enforces the maximum allowable retries limit. The error messages are clear and informative, helping developers understand the constraints.

src/main/java/dev/openfga/sdk/util/RetryStrategy.java (3)

51-71: LGTM! Excellent implementation of RFC 9110-compliant retry logic.

The method correctly implements the differentiated retry behavior for state-affecting vs non-state-affecting operations, properly handles the 429 always-retry case, and correctly excludes 501 errors from retry attempts.


80-88: LGTM! Clean implementation of delay calculation priority.

The method correctly prioritizes server-specified Retry-After delays over exponential backoff, which aligns with RFC 9110 recommendations and best practices.


97-99: LGTM! Correct identification of state-affecting methods.

The method properly identifies state-affecting HTTP methods and handles case-insensitive comparison appropriately.

src/test/java/dev/openfga/sdk/util/ExponentialBackoffTest.java (3)

24-36: LGTM! Excellent test for base delay calculation.

The test correctly verifies that retry count 0 produces a delay between 100ms and 200ms (base delay with jitter), using a fixed Random seed for deterministic testing.


69-80: LGTM! Proper validation of maximum delay capping.

The test correctly verifies that high retry counts are capped at the 120-second maximum, preventing unbounded delay growth.


140-157: LGTM! Comprehensive progression validation.

The test systematically validates the exponential progression pattern across multiple retry counts, ensuring both the base calculation and jitter ranges are correct. The use of setSeed(42) for each iteration ensures consistent test results.

src/main/java/dev/openfga/sdk/api/client/HttpRequestAttempt.java (2)

105-121: LGTM! Excellent integration of the new retry strategy.

The updated retry logic properly integrates all the new utilities:

  • Parses Retry-After header using the dedicated parser
  • Uses RetryStrategy.shouldRetry for RFC 9110-compliant retry decisions
  • Calculates delays using RetryStrategy.calculateRetryDelay
  • Maintains proper retry count checking

The integration maintains the existing flow while incorporating the enhanced retry behavior.


168-176: LGTM! Robust delay validation and fallback logic.

The method properly handles edge cases by validating the retry delay and providing sensible fallbacks. This ensures the retry mechanism remains robust even with invalid delay values.

README.md (2)

968-991: LGTM! Comprehensive and clear retry behavior documentation.

The documentation excellently explains the new RFC 9110-compliant retry strategy, clearly differentiates between read and write operation retry behavior, and prominently highlights breaking changes. This will greatly assist users in understanding and migrating to the new retry logic.


1005-1006: LGTM! Practical configuration and error handling examples.

The examples clearly demonstrate how to configure the new retry behavior and access the Retry-After header information in error handling, providing users with actionable guidance.

Also applies to: 1019-1034

src/test/java/dev/openfga/sdk/util/RetryStrategyTest.java (4)

62-94: LGTM! Critical tests validating breaking changes for state-affecting operations.

These tests properly validate the new retry behavior where state-affecting methods (POST, PUT, PATCH, DELETE) only retry on 5xx errors when a Retry-After header is present. The explicit "Breaking change" comments help document the behavior change.


153-171: LGTM! Comprehensive validation of non-state-affecting method behavior.

The test properly validates that read operations (GET, HEAD, OPTIONS) maintain backward-compatible retry behavior on 5xx errors, regardless of Retry-After header presence.


246-261: LGTM! Important case-insensitive method handling test.

This test ensures the retry strategy works correctly regardless of HTTP method case, providing robustness against variations in method naming.


205-243: LGTM! Thorough validation of delay calculation priority.

The tests correctly validate that Retry-After header values are prioritized over exponential backoff, and that the fallback exponential backoff produces delays within the expected jitter ranges.

src/test/java/dev/openfga/sdk/api/client/HttpRequestAttemptRetryTest.java (9)

30-55: LGTM! Well-structured test setup.

The test setup follows best practices with proper WireMock configuration, reasonable test delays, and correct resource cleanup in the teardown method.


57-90: LGTM! Comprehensive test for 429 retry behavior.

The test correctly validates the core retry functionality with proper scenario sequencing and verification of both success response and retry count.


92-125: LGTM! Correctly tests server error retry for GET requests.

The test properly validates the differentiated retry behavior where GET requests retry on 5xx errors when a Retry-After header is present.


127-151: LGTM! Critical test for breaking change behavior.

This test correctly validates the breaking change where POST requests (state-affecting operations) do not retry on 5xx errors without a Retry-After header. The comment clearly documents this behavioral change.


153-186: LGTM! Validates positive case for POST retry behavior.

This test correctly complements the previous test by showing that POST requests do retry on 5xx errors when a Retry-After header is present, completing the validation of state-affecting operation retry logic.


188-215: LGTM! Correctly tests 501 exception handling.

The test properly validates that 501 (Not Implemented) responses are never retried, even with a Retry-After header present, which aligns with the RFC requirements and PR objectives.


217-244: LGTM! Properly validates retry limit enforcement.

The test correctly verifies that retry attempts respect the configured maximum retries, with the proper calculation of total requests (initial + maxRetries).


285-318: LGTM! Validates graceful handling of invalid Retry-After values.

The test correctly verifies that the system gracefully handles invalid Retry-After header values by falling back to exponential backoff, which is important for robust error handling.


246-283: Requesting HttpRequestAttempt retry logic for precise verification

To confirm which retryCount value is passed into ExponentialBackoff.calculateDelay (and thus whether the first retry uses a 100 ms or 200 ms base), please share the snippet from HttpRequestAttempt.java (around where retries are scheduled) and the RetryStrategy.java block that invokes calculateDelay. Specifically, we need to see:

  • How retryCount is initialized and incremented in HttpRequestAttempt.
  • The exact call into ExponentialBackoff.calculateDelay(...) (via RetryStrategy).

This will show whether the first retry uses retryCount=0 (100 ms–200 ms) or retryCount=1 (200 ms–400 ms).

@jimmyjames jimmyjames marked this pull request as ready for review July 28, 2025 21:34
@jimmyjames jimmyjames requested review from a team as code owners July 28, 2025 21:34
evansims
evansims previously approved these changes Jul 29, 2025
Copy link
Member

@rhamzeh rhamzeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely PR + tests + documentation! ❤️

Just a small change (updated the issue).

We later decided to retry on 5xx on all requests regardless of whether they change state or not.

Can we:

  • For all (429s + 5xxs) except 501:
    • honor Retry-After when sent by the server, falling back to exponential backoff when it's not.
  • For network errors (e.g. no internet connectivity/LB closes the connection/server disconnects) [these don't necessarily have an http reponse and status as the server didn't reply)
    • Retry using exponential backoff

dyeam0
dyeam0 previously approved these changes Jul 31, 2025
Copy link
Member

@dyeam0 dyeam0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs are excellent. Looks good great!

- **BREAKING**: FgaError now exposes Retry-After header value via getRetryAfterHeader() method

### Technical Details
- Implements RFC 9110 compliant Retry-After header parsing (supports both integer seconds and HTTP-date formats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we can support HTTP-date format? That's cool. I thought during Resiliency Week discussions with @rhamzeh that there was difficulty in doing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Was there? This is how we built it for the Go SDKs.

The only problem was a personal one, date based retries are bad b/c you can never be sure the client and server are on the same clock and don't have any drift, but we support that in Go https://github.com/openfga/go-sdk/blob/main/internal/utils/retryutils/retryutils.go#L60-L63

Copy link
Member

@dyeam0 dyeam0 Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, personally I feel the same. Without tight control and a common NTP server, there are always clock drift issues. But I have no issues with supporting it, though. I must have been misremembering our conversation about HTTP-date.

@jimmyjames
Copy link
Contributor Author

Lovely PR + tests + documentation! ❤️

Just a small change (updated the issue).

We later decided to retry on 5xx on all requests regardless of whether they change state or not.

Can we:

  • For all (429s + 5xxs) except 501:

    • honor Retry-After when sent by the server, falling back to exponential backoff when it's not.
  • For network errors (e.g. no internet connectivity/LB closes the connection/server disconnects) [these don't necessarily have an http reponse and status as the server didn't reply)

    • Retry using exponential backoff

Updated to remove state-specific handling and retry on network issues.

Also tested some basic and retry functionality using a test app created (not included) to exercise that code, worked as expected.

Copy link
Member

@rhamzeh rhamzeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the minor notes - we support overriding retry + min delay per request (

public ConfigurationOverride maxRetries(int maxRetries) {
this.maxRetries = maxRetries;
return this;
}
@Override
public Integer getMaxRetries() {
return maxRetries;
}
public ConfigurationOverride minimumRetryDelay(Duration minimumRetryDelay) {
this.minimumRetryDelay = minimumRetryDelay;
return this;
}
@Override
public Duration getMinimumRetryDelay() {
return minimumRetryDelay;
}
) - if we don't have a test for that already, can we add one (or create a ticket to add one?)

* @param statusCode The HTTP response status code
* @param hasRetryAfterHeader Whether the response contains a valid Retry-After header
* @param hasRetryAfterHeader Whether the response contains a valid Retry-After header (kept for API compatibility)
* @return true if the request should be retried, false otherwise
*/
public static boolean shouldRetry(HttpRequest request, int statusCode, boolean hasRetryAfterHeader) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to maintain compatibility? We haven't merged the PR yet - so it's fine to remove that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right! 💥

🤣

Removed unneeded params. Interesting how changes made in a long-running manner on a branch can get AI kinda confused. Will need to keep that in mind for the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent catch - this appears to be a scenario missed in this PR (and not something we support yet in go or python?). I'll need to revisit the changes here to support the per-request configuration along with the global configuration and default behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to honor the min retry scenario and add tests here: 9cfce52

void shouldNotRetryWith500WithoutRetryAfterHeaderForPostRequest() throws Exception {
// Given - Breaking change: POST requests should NOT retry on 5xx without Retry-After
void shouldRetryWith500WithoutRetryAfterHeaderForPostRequest() throws Exception {
// Given - Simplified logic: POST requests should retry on 5xx errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the simplified parts in this and other comments?

README.md Outdated
@@ -125,7 +125,7 @@ libraryDependencies += "dev.openfga" % "openfga-sdk" % "0.8.3"

We strongly recommend you initialize the `OpenFgaClient` only once and then re-use it throughout your app, otherwise you will incur the cost of having to re-initialize multiple times or at every request, the cost of reduced connection pooling and re-use, and would be particularly costly in the client credentials flow, as that flow will be preformed on every request.

> The `Client` will by default retry API requests up to 3 times. Rate limiting (429) errors are always retried. Server errors (5xx) are retried for read operations, but write operations only retry when the server provides a `Retry-After` header.
> The `Client` will by default retry API requests up to 3 times. Rate limiting (429) errors are always retried. Server errors (5xx) are retried for all operations, with intelligent delay calculation using `Retry-After` headers when provided or exponential backoff as fallback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might seem nitpicky - but can we remove intelligent here and in other comments? 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to fix 👍

@@ -1913,7 +1921,8 @@ DEFAULT_STORE_ID, DEFAULT_AUTH_MODEL_ID, new WriteAssertionsRequest())
.get());

// Then
mockHttpClient.verify().put(putUrl).called(1 + DEFAULT_MAX_RETRIES);
// Simplified logic: PUT requests now retry on 5xx errors (1 initial + 3 retries = 4 total)
mockHttpClient.verify().put(putUrl).called(4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add a test w/ a network error here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if I can ask, can we have a client request where the token exchange request fails once, to make sure all is recovering properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add a test w/ a network error here?

The http mocking lib makes this hard to do here, since it wraps excdeptions with an IllegalStateException. I think the tests we have in HttpRequestAttemptRetryTest in wiremock, including covers that scenario adequately and better simulates network issues

And if I can ask, can we have a client request where the token exchange request fails once, to make sure all is recovering properly?

We have exchangeOAuth2TokenWithRetriesSuccess and exchangeOAuth2TokenWithRetriesFailure in OAuth2ClientTest - are those missing a scenario that should be tested for?

@@ -1662,7 +1669,8 @@ public void listObjects_500() throws Exception {
.get());

// Then
mockHttpClient.verify().post(postUrl).called(1 + DEFAULT_MAX_RETRIES);
// Simplified logic: POST requests now retry on 5xx errors (1 initial + 3 retries = 4 total)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob keeping these as they were (1 + DEFAULT_MAX_RETRIES) is more maintainable as we change the value of DEFAULT_MAX_RETRIES

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to fix 👍

@@ -140,11 +140,11 @@ public void exchangeOAuth2TokenWithRetriesSuccess(WireMockRuntimeInfo wm) throws
.willReturn(jsonResponse("rate_limited", 429))
.willSetStateTo("rate limited once"));

// Then return 500
// Then return 500 with Retry-After header (breaking change: POST requests need Retry-After for 5xx retries)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to fix 👍

dyeam0
dyeam0 previously approved these changes Aug 3, 2025
@jimmyjames jimmyjames requested a review from rhamzeh August 5, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the retry strategy
5 participants