-
Notifications
You must be signed in to change notification settings - Fork 277
feat: add retryable property for Error #1383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ZENOTME
wants to merge
2
commits into
apache:main
Choose a base branch
from
ZENOTME:retryable_error
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite confusing. Whether it's retryable should be determined by consumer by
ErrorKind
.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current idea is to introduce an
ErrorStatus
:temporary
: This error is temporary, and users can retry the operation if they want.permanent
: This error is permanent, so users can't retry it. Even if they do, it's highly likely that it won't work.In this way, we convey the status, which is only known to us and is orthogonal of
ErrorKind
instead of the final decision,retryable
.For example,
Unexpected
error can bepermanent
andPreconditionFailed
error can betemporary
.This is also very useful in cases like our REST catalog client. We can offer built-in retry support and convert a
temporary
error into apermanent
one after several retry attempts.I also think it would be a good idea to use a
temporary: bool
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temporary: bool
+1. I think the basic idea here is to let retryable or temporary property orthogonal of ErrorKind.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why
PreconditionFailed
can be temporrary. For example,PreconditionFailed
could be invalid input from user, and it should bepermanent
.I'm still confusing about this part. Let's think about the use case, usually the caller of a function determines whether they need to retry, depending on the return error. For example, the user could decide to retry strategy when they face network timeout failure.
If we add the
status
ortemporary
field, it's the function telling its caller whether they should retry. My concern is that adding such a field a general purposeError
could be misleading or confusing. If we add this field into a concreteErrorKind
variant such asCommitConflict(bool)
, it makes more sense to me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example makes sense to me.🤔 There are two kinds of retry process: 1. commit conflict, 2. other(maybe network timeout as @liurenjie1024 says). It seems that in the SDK, at least for now, we only try to retry the process when we meet CommitConflict. (In Java, exceptions only retry when they meet CommitFailedException). For other errors, maybe it should be determined by the user. And we can determine whether need to let retry orthogonal of ErrorKind until we have multiple kinds of errors that need to be retried?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases like concurrent commit. Users can retry until this operation succeed or give up while retry times reached.
Take S3 as an example:
https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html
I think that's exactly why I want to use "temporary" here. We're telling users that this error is temporary, allowing them to decide how to proceed based on their own context.
A temporary error doesn't necessarily mean it should be retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this return by the catalog client as a CommitConflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand the benefits of having an
ErrorStatus
withinError
to define if an error is temporary. This basically means we can mark anyErrorKind
as "retryable". Maybe there is any concrete example?On the other hand, I think adding a dedicated
ErrorKind
likeCommitFailed
for commit conflicts doesn't conflict with the idea of having anErrorStatus
---we can have both if necessary. Opened #1452 based on this thought, would be happy to hear your thoughts as well!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have an error type called
CommitFailed
, since users can't really do anything about it.I still think it would be better to add a
status
ortemporary
field toError
to make it more orthogonal thanErrorKind
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive for a
CommitConflict
error kind, as this align better with rest spec: https://github.com/apache/iceberg/blob/8ed1c216503b7193924ca57bd2694025660ac02c/open-api/rest-catalog-open-api.yaml#L1375.As we discussed in last sync, this basically means that for some error kind, sometimes it's retryable, and sometimes it's not. I can't come up with such a case, is there any concrete example?