Skip to content

Pre-epoch membership uses static leaders #3303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

lukaszrzasik
Copy link
Contributor

@lukaszrzasik lukaszrzasik commented May 22, 2025

Closes #<ISSUE_NUMBER>

This PR:

This PR does not:

Key places to review:


@@ -1137,11 +1137,11 @@ impl Membership<SeqTypes> for EpochCommittees {
view_number: <SeqTypes as NodeType>::View,
epoch: Option<Epoch>,
) -> Result<PubKey, Self::Error> {
if let Some(epoch) = epoch {
let Some(randomized_committee) = self.randomized_committees.get(&epoch) else {
if self.first_epoch.is_some() && epoch.is_some_and(|e| e >= self.first_epoch.unwrap()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I agree with this change

the effect is now, if we e.g. have first_epoch = 1000, and we query for membership.leader(view_number: 10, epoch: Some(1)), we will actually get an answer. To be fair the answer will be correct, but the query is wrong -- we should never be passing epoch: Some(1)

I think the bigger problem is that -- I think there is only really ambiguity during the first epoch (where we use both the randomized & non-randomized stake tables), but this doesn't change the behavior for the first epoch at all!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can agree with that but then it's inconsistent with what this method returns. Basically, the whole issue has become apparent because of this piece of code. We call membership_for_epoch which internally calls has_randomized_stake_table which returns true but then leader immediately returns the error.

The problem is, currently, when a node starts clean in epoch version already, it never uses epoch == None.

So I think we have two options:

  1. The caller is responsible to check every time if epoch is less than first_epoch and use None if true, or
  2. Handle that on the membership side which happens to have all the needed information. I chose the second option after briefly discussing with Abdul.

I'm open for suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused though -- is membership_for_epoch(epoch) being called with epoch = Some(epoch_number) where epoch_number < first_epoch? in that case it's being called with the wrong thing and we need to figure out where that number is set and fix it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it come from the ViewChange event? I don't see why we couldn't just make the event broadcast Some(first_epoch) instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it comes from the initial ViewChange event and yes, we can make it so that the initial ViewChange event includes first_epoch if it's set. I've been planning to do that along with additional changes like populating all the epochs from the genesis to the first_epoch + 1 with the genesis stake table.

But I think this is a separate issue. In this PR I would like to fix the inconsistency in the EpochCommittees implementation. I agree that calling lookup_leader with Some(epoch) where epoch < first_epoch is incorrect but then shouldn't the same be true for has_randomized_stake_table? In short, we can't expect a pre-epoch membership to have a randomized leader.

So I propose both of these methods return an error when called in such a way.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a ticket for the ViewChange issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I agree with making has_randomized_stake_table consistent.

So instead of the change in this PR, we would just make that take an Option<Epoch> instead (and return an error when that fails)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, when the received argument is Some(epoch) and epoch < first_epoch we return an error.

@lukaszrzasik
Copy link
Contributor Author

The discussed changes has been implemented here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants