Skip to content

Conversation

@ethanoroshiba
Copy link
Contributor

@ethanoroshiba ethanoroshiba commented Jan 31, 2025

Summary

Changes polling method for confirming Celestia commitment from GetTx gRPC to TxStatus.

Background

Previously, GetTx was polled indefinitely. On the client end of this RPC, we'd have no indication of what's happening on Celestia, so we had no way to know if a transaction was, for instance, evicted from the mempool, which would cause a halt of data submission. The TxStatus endpoint solves this problem by providing a more detailed status of the transaction.

Changes

  • Poll TxStatus instead of GetTx. The client will behave in the following ways:
    • Status is PENDING: continue polling
    • Status is UNKNOWN: continue polling until MAX_WAIT_TIME_UNKNOWN is reached (currently 7s)
    • Status is COMMITTED: return block height
    • Status is EVICTED: return an error

Testing

  • Adjusted mounts as necessary.
  • Added blackbox tests for each possible status response.

Changelogs

Changelog updated.

Metrics

  • Added unknown status transaction count.
  • Added evicted transaction count.

Related Issues

closes #1936

@github-actions github-actions bot added sequencer-relayer pertaining to the astria-sequencer-relayer crate cd labels Jan 31, 2025
@github-actions github-actions bot added the proto pertaining to the Astria Protobuf spec label Feb 3, 2025
@ethanoroshiba ethanoroshiba changed the title fix(sequencer-relayer)!: use 'tx_status' instead of 'GetTx' fix(sequencer-relayer): use 'tx_status' instead of 'GetTx' Feb 3, 2025
@ethanoroshiba ethanoroshiba removed the cd label Feb 3, 2025
@ethanoroshiba ethanoroshiba marked this pull request as ready for review February 3, 2025 19:45
@ethanoroshiba ethanoroshiba changed the title fix(sequencer-relayer): use 'tx_status' instead of 'GetTx' fix(sequencer-relayer): use 'TxStatus' instead of 'GetTx' Feb 4, 2025
@ethanoroshiba ethanoroshiba changed the title fix(sequencer-relayer): use 'TxStatus' instead of 'GetTx' fix(sequencer-relayer): poll Celestia App for transaction status Feb 5, 2025

sequencer_relayer
.timeout_ms(
11_000,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an abhorrently long time for a test to take, but we need to test that the relayer correctly exits if the allotted time passes without the status changing from "UNKNOWN". We cannot advance time manually due to the multithreaded context, so I'm not sure of a way around this :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could perhaps change the actual timeout based on the test feature:

#[cfg(test)]
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(2);
#[cfg(not(test))]
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(10);

Alternatively, 10 seconds might be too long anyway given that Celestia block time is now 6 seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4f304bb I lowered the time to 6s and the minimum logging time to 3. I'm not sure how I feel about building the testing functionality into the source code, but I'll defer to you here on what you think is best, since I think I lack the context of what is best practice. I feel like generally, though, it's best to keep testing logic (especially for blackbox tests) separate

Copy link
Contributor

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looking good. Just a few comments.

Comment on lines 395 to 402
// The min seconds to sleep after receiving a TxStatus response and sending the next
// request.
const POLL_INTERVAL_SECS: u64 = 1;
// The minimum duration between logs.
const LOG_INTERVAL: Duration = Duration::from_secs(5);
// The maximum amount of time to wait for a transaction to be committed if its status is
// `UNKNOWN`.
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering about the rationale for changing the retry backoff and removing the START_LOGGING_DELAY? Seems like that behaviour is still desirable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I thought because this functionality was moved to the debug level, it could be helpful to have the full picture of each response the relayer has received. I don't have a strong opinion on this, though, so I'd be happy to add that back if you think it's better


sequencer_relayer
.timeout_ms(
11_000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could perhaps change the actual timeout based on the test feature:

#[cfg(test)]
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(2);
#[cfg(not(test))]
const MAX_WAIT_FOR_UNKNOWN: Duration = Duration::from_secs(10);

Alternatively, 10 seconds might be too long anyway given that Celestia block time is now 6 seconds?

ethanoroshiba added a commit that referenced this pull request Mar 17, 2025
@joroshiba
Copy link
Member

This PR is stale because it has been open 45 days with no activity. Remove stale label or this PR will be
closed in 7 days.

@ethanoroshiba ethanoroshiba added ignore-stale Override for issues or PRs which should not be removed if stale. and removed stale labels Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ignore-stale Override for issues or PRs which should not be removed if stale. override-freeze proto pertaining to the Astria Protobuf spec sequencer-relayer pertaining to the astria-sequencer-relayer crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Relayer gets stuck in infinite loop when transaction is evicted from Celestia mempool

3 participants