-
Notifications
You must be signed in to change notification settings - Fork 276
feat: add retryable property for Error #1383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -134,6 +134,8 @@ pub struct Error { | |||
|
|||
source: Option<anyhow::Error>, | |||
backtrace: Backtrace, | |||
|
|||
retryable: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite confusing. Whether it's retryable should be determined by consumer by ErrorKind
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether it's retryable should be determined by consumer by
ErrorKind
.
My current idea is to introduce an ErrorStatus
:
temporary
: This error is temporary, and users can retry the operation if they want.permanent
: This error is permanent, so users can't retry it. Even if they do, it's highly likely that it won't work.
In this way, we convey the status, which is only known to us and is orthogonal of ErrorKind
instead of the final decision, retryable
.
For example, Unexpected
error can be permanent
and PreconditionFailed
error can be temporary
.
This is also very useful in cases like our REST catalog client. We can offer built-in retry support and convert a temporary
error into a permanent
one after several retry attempts.
I also think it would be a good idea to use a temporary: bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite confusing. Whether it's retryable should be determined by consumer by ErrorKind.
temporary: bool
+1. I think the basic idea here is to let retryable or temporary property orthogonal of ErrorKind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, Unexpected error can be permanent and PreconditionFailed error can be temporary.
I don't quite understand why PreconditionFailed
can be temporrary. For example, PreconditionFailed
could be invalid input from user, and it should be permanent
.
I think the basic idea here is to let retryable or temporary property orthogonal of ErrorKind.
I'm still confusing about this part. Let's think about the use case, usually the caller of a function determines whether they need to retry, depending on the return error. For example, the user could decide to retry strategy when they face network timeout failure.
If we add the status
or temporary
field, it's the function telling its caller whether they should retry. My concern is that adding such a field a general purpose Error
could be misleading or confusing. If we add this field into a concrete ErrorKind
variant such as CommitConflict(bool)
, it makes more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add the status or temporary field, it's the function telling its caller whether they should retry. My concern is that adding such a field a general purpose Error could be misleading or confusing.
This example makes sense to me.🤔 There are two kinds of retry process: 1. commit conflict, 2. other(maybe network timeout as @liurenjie1024 says). It seems that in the SDK, at least for now, we only try to retry the process when we meet CommitConflict. (In Java, exceptions only retry when they meet CommitFailedException). For other errors, maybe it should be determined by the user. And we can determine whether need to let retry orthogonal of ErrorKind until we have multiple kinds of errors that need to be retried?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why
PreconditionFailed
can be temporrary.
In cases like concurrent commit. Users can retry until this operation succeed or give up while retry times reached.
Take S3 as an example:
https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html
ConditionalRequestConflict
A conflicting operation occurred. If using PutObject you can retry the request. If using multipart upload you should initiate another CreateMultipartUpload request and re-upload each part.
If we add the
status
ortemporary
field, it's the function telling its caller whether they should retry.
I think that's exactly why I want to use "temporary" here. We're telling users that this error is temporary, allowing them to decide how to proceed based on their own context.
A temporary error doesn't necessarily mean it should be retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases like concurrent commit. Users can retry until this operation succeed or give up while retry times reached.
Shouldn't this return by the catalog client as a CommitConflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand the benefits of having an ErrorStatus
within Error
to define if an error is temporary. This basically means we can mark any ErrorKind
as "retryable". Maybe there is any concrete example?
On the other hand, I think adding a dedicated ErrorKind
like CommitFailed
for commit conflicts doesn't conflict with the idea of having an ErrorStatus
---we can have both if necessary. Opened #1452 based on this thought, would be happy to hear your thoughts as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, I think adding a dedicated
ErrorKind
likeCommitFailed
for commit conflicts doesn't conflict with the idea of having anErrorStatus
I don't think we should have an error type called CommitFailed
, since users can't really do anything about it.
I still think it would be better to add a status
or temporary
field to Error
to make it more orthogonal than ErrorKind
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive for a CommitConflict
error kind, as this align better with rest spec: https://github.com/apache/iceberg/blob/8ed1c216503b7193924ca57bd2694025660ac02c/open-api/rest-catalog-open-api.yaml#L1375.
I still think it would be better to add a status or temporary field to Error to make it more orthogonal than ErrorKind.
As we discussed in last sync, this basically means that for some error kind, sometimes it's retryable, and sometimes it's not. I can't come up with such a case, is there any concrete example?
Which issue does this PR close?
This PR is the first part of #964.
What changes are included in this PR?
Before we support retryable process, this PR add retryable property for Error.
Are these changes tested?