Skip to content

feat: graceful shutdown #150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

frisitano
Copy link
Collaborator

@frisitano frisitano commented Jun 2, 2025

Overview

This PR implements a graceful shutdown for the rollup node to ensure that critical in-flight tasks are driven to completion before shutdown.

Test strategy

The test implemented for this feature is to start up a rollup node externally to the EN, such that we can manually poll it. This is required such that we can simulate a crash/shutdown midway through the processing of a batch. We simulate a crash midway through processing and then ensure that the node can start up and reprocess the latest batch that we crashed at. We assert that the EN safe block progresses to the expected chain tip upon restart.

Note: There may be an edge case in which the finalized head is greater than the safe head on restart, as we revert to the safe head. As such, when we issue the forkchoice rule, we ensure that we only issue an update if the safe block hash is greater than the finalized block hash.

Implementation

The following modifications have been made to support this feature:

  • At startup, always reindex the latest block
  • Modify database models to do nothing on conflict when inserting new data from the watcher
  • Add a method to the database to determine which safe block the node should start at
  • Add a utility function to determine the l1 block number the watcher should start at
  • Introduce a ConsolidationOutcome type to clean up the consolidation interface
  • Only update safe head if the safe head is greater than the finalized head
  • Add a run_until_graceful_shutdown function on the RNM and integrate with the reth TaskExecutor
  • Migrate the instantiation of the RNM from the ScrollRollupConfig

@frisitano frisitano requested a review from greged93 June 5, 2025 13:26
Copy link
Collaborator

@greged93 greged93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for refactoring some parts of the code base as well! Left a couple of small inline comments

l1_message.insert(self.get_connection()).await?;
models::l1_message::Entity::insert(l1_message)
.on_conflict(
OnConflict::column(models::l1_message::Column::QueueIndex).do_nothing().to_owned(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have an edge case where we index an L1 message, shutdown, the L1 reorgs, and we don't update it because of the do_nothing directive?

@@ -511,3 +532,18 @@ fn delayed_interval(interval: u64) -> Interval {
interval.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Delay);
interval
}

/// Computes the starting block for the watcher from the database.
pub async fn compute_watcher_start_block_from_database(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this live here?

.await?
.expect("Batch info must be present due to database query arguments");
let block = self.get_highest_block_for_batch(batch.hash).await?;
tracing::info!(target:"test", "{:?}", block);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants