Skip to content

Conversation

@lexnv
Copy link

@lexnv lexnv commented Jun 9, 2025

A best round is completed when the following conditions are met:

  • The best round future is completed (ie inner.best_round.poll(cx) returns Poll::Ready(()))
  • The state of the best round state advances to Precommited (from Prevoting)

When one of the previous conditions is not met, the process_best_round function returns Poll::Pending without storing the waker context.

The grandpa code relies on the first context wakeup of either:

  • incoming events via the global input stream (ie process_incoming)
  • pruning background rounds (either a past round generated a commit, or we receive a finalization notification)
  • global output wakes the waker
    fn poll(mut self: Pin<&mut Self>, cx: &mut Context) -> Poll<Result<(), E::Error>> {
    self.process_incoming(cx)?;
    self.prune_background_rounds(cx)?;
    let _ = self.global_out.poll(cx)?;
    self.process_best_round(cx)
    }

This PR ensures we save the waker and wake it up properly on state advances (instead of relying on the above components and possibly causing a delay in starting the next round).

Notes

We are also seeing a high number of grandpa debug los:

2025-06-06 20:10:51.909 DEBUG tokio-runtime-worker grandpa: Completed round 50, state = State { prevote_ghost: Some((0xa82f134d09119028e9b5c8f1846a0440d66f284292cd57ebfba0bfaf31e44a54, 4469)), finalized: Some((0xa82f134d09119028e9b5c8f1846a0440d66f284292cd57ebfba0bfaf31e44a54, 4469)), estimate: Some((0xa82f134d09119028e9b5c8f1846a0440d66f284292cd57ebfba0bfaf31e44a54, 4469)), completable: true }, step = Some(Prevoting)  
	
2025-06-06 20:10:51.909 DEBUG tokio-runtime-worker grandpa: Completed round 50, state = State { prevote_ghost: Some((0xa82f134d09119028e9b5c8f1846a0440d66f284292cd57ebfba0bfaf31e44a54, 4469)), finalized: Some((0xa82f134d09119028e9b5c8f1846a0440d66f284292cd57ebfba0bfaf31e44a54, 4469)), estimate: Some((0xa82f134d09119028e9b5c8f1846a0440d66f284292cd57ebfba0bfaf31e44a54, 4469)), completable: true }, step = Some(Prevoting)    

2025-06-06 20:10:51.914 DEBUG tokio-runtime-worker grandpa: Completed round 50, state = State { prevote_ghost: Some((0xa82f134d09119028e9b5c8f1846a0440d66f284292cd57ebfba0bfaf31e44a54, 4469)), finalized: Some((0xa82f134d09119028e9b5c8f1846a0440d66f284292cd57ebfba0bfaf31e44a54, 4469)), estimate: Some((0xa82f134d09119028e9b5c8f1846a0440d66f284292cd57ebfba0bfaf31e44a54, 4469)), completable: true }, step = Some(Prevoting)    

The previous logs are repeated ~3.4k times during one second during a finality stall, suggesting that some other component (mentioned above) was able to make progress, but not yet the best voting round.

I would leave this PR open for a while, in case we choose to revisit it at a later time and perform in-depth testing. (May we ever want to refactor the code to align with #124 this might come in handy)

@lexnv lexnv self-assigned this Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants