-
Notifications
You must be signed in to change notification settings - Fork 125
Pre-epoch membership uses static leaders #3303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
types/src/v0/impls/stake_table.rs
Outdated
@@ -1137,11 +1137,11 @@ impl Membership<SeqTypes> for EpochCommittees { | |||
view_number: <SeqTypes as NodeType>::View, | |||
epoch: Option<Epoch>, | |||
) -> Result<PubKey, Self::Error> { | |||
if let Some(epoch) = epoch { | |||
let Some(randomized_committee) = self.randomized_committees.get(&epoch) else { | |||
if self.first_epoch.is_some() && epoch.is_some_and(|e| e >= self.first_epoch.unwrap()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I agree with this change
the effect is now, if we e.g. have first_epoch = 1000, and we query for membership.leader(view_number: 10, epoch: Some(1))
, we will actually get an answer. To be fair the answer will be correct, but the query is wrong -- we should never be passing epoch: Some(1)
I think the bigger problem is that -- I think there is only really ambiguity during the first epoch (where we use both the randomized & non-randomized stake tables), but this doesn't change the behavior for the first epoch at all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I can agree with that but then it's inconsistent with what this method returns. Basically, the whole issue has become apparent because of this piece of code. We call membership_for_epoch
which internally calls has_randomized_stake_table
which returns true but then leader
immediately returns the error.
The problem is, currently, when a node starts clean in epoch version already, it never uses epoch == None.
So I think we have two options:
- The caller is responsible to check every time if epoch is less than
first_epoch
and use None if true, or - Handle that on the membership side which happens to have all the needed information. I chose the second option after briefly discussing with Abdul.
I'm open for suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused though -- is membership_for_epoch(epoch)
being called with epoch = Some(epoch_number)
where epoch_number < first_epoch
? in that case it's being called with the wrong thing and we need to figure out where that number is set and fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it come from the ViewChange
event? I don't see why we couldn't just make the event broadcast Some(first_epoch)
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it comes from the initial ViewChange
event and yes, we can make it so that the initial ViewChange
event includes first_epoch
if it's set. I've been planning to do that along with additional changes like populating all the epochs from the genesis to the first_epoch + 1
with the genesis stake table.
But I think this is a separate issue. In this PR I would like to fix the inconsistency in the EpochCommittees
implementation. I agree that calling lookup_leader
with Some(epoch)
where epoch < first_epoch
is incorrect but then shouldn't the same be true for has_randomized_stake_table
? In short, we can't expect a pre-epoch membership to have a randomized leader.
So I propose both of these methods return an error when called in such a way.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a ticket for the ViewChange
issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I agree with making has_randomized_stake_table
consistent.
So instead of the change in this PR, we would just make that take an Option<Epoch>
instead (and return an error when that fails)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly, when the received argument is Some(epoch)
and epoch < first_epoch
we return an error.
The discussed changes has been implemented here |
Closes #<ISSUE_NUMBER>
This PR:
This PR does not:
Key places to review: