-
Notifications
You must be signed in to change notification settings - Fork 635
Refactor consensus-execution sync to push sync data #3538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This change eliminates a circular dependency between the consensus and execution layers by transforming the sync status flow from a pull-based to a push-based model. Previously, the execution layer would query the consensus layer for sync status through the ConsensusInfo interface, creating a tight coupling between the layers. The new architecture introduces a ConsensusSyncData structure that contains sync status, target message count, and progress information. The ConsensusExecutionSyncer now periodically pushes this data from consensus to execution, where it's stored using an atomic pointer for lock-free reads. This approach maintains consistency with the existing finality data push mechanism and provides better performance through reduced lock contention. As part of this refactoring, the ConsensusInfo interface has been simplified to only include the BlockMetadataAtMessageIndex method, removing the now-redundant Synced, FullSyncProgressMap, and SyncTargetMessageCount methods. This cleaner separation of concerns better supports alternative client implementations by clearly defining the data flow boundaries between consensus and execution layers.
Also lower the sync interval for tests.
syncData := &execution.ConsensusSyncData{ | ||
Synced: c.syncMonitor.Synced(), | ||
SyncTargetMessageCount: c.syncMonitor.SyncTargetMessageCount(), | ||
SyncProgressMap: c.syncMonitor.FullSyncProgressMap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SyncProgressMap should be nil or empty if Synced (prevents wasteful locking/etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in latest commit.
synced, err := s.consensus.Synced().Await(ctx) | ||
if err != nil { | ||
log.Error("Error checking if consensus is synced", "err", err) | ||
data := s.consensusSyncData.Load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be more complex..
If you're pushing from consensus there is already a delay from the time the message you pushed until the time it is read.
I think:
- ConsensusSyncData should not include SyncTarget, but "maxMessageCount" from consensus (synTarget is a d delayed maxMessageCount).
- execution side should have a config of MsgLag
- execution side should have it's own TargetMessage. Ideally, this would be the last MaxMessage it got from consensus more then MsgLag ago - but no more then 2MsgLAg ago (if no data arrived in the last 2MsgLAg the target doesn't matter - it's not in sync. If it has data only from within the last MsgLag - it should use the least recent one there)
- execution side should say it's in sync only if:
** last syncStatus from consensus is at no more then MsgLag old
** last syncStatus from consensus says consensus is synced
** execution met the internal TargetMessage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in latest commit.
Only populate SyncProgressMap when not synced. MaxMessageCount is now a dedicated field that's always sent. Fix stale sync targets caused by push delay. Instead of consensus sending pre-calculated targets, it now sends raw MaxMessageCount. Execution maintains a sliding window history and calculates its own target using values from 1-2 MsgLag ago, properly accounting for the push delay. The default push interval and execution message lag are both 1 second so they work together well. Includes unit tests for the sliding window implementation.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #3538 +/- ##
==========================================
+ Coverage 22.70% 22.80% +0.09%
==========================================
Files 388 388
Lines 58900 59016 +116
==========================================
+ Hits 13375 13456 +81
- Misses 43486 43517 +31
- Partials 2039 2043 +4 🚀 New features to boost your workflow:
|
execution/gethexec/sync_monitor.go
Outdated
windowEnd := now.Add(-h.msgLag) | ||
|
||
for _, entry := range h.entries { | ||
if !entry.timestamp.Before(windowStart) && !entry.timestamp.After(windowEnd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I look at the current description of msgLag - I think what we actually want is the oldest message that's less then MsgLag old (2*MsgLag is not relevant).
We can discard anything that's more then MsgLag old.
We can do other things with different documentation for MsgLag - but this method (which is different from what I said before) seems to fit current documentation and be simple enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it to get the oldest message that is newer than msgLag old.
execution/gethexec/sync_monitor.go
Outdated
|
||
// Add the max message count to history for sync target calculation | ||
if syncData != nil && syncData.MaxMessageCount > 0 { | ||
s.syncHistory.add(syncData.MaxMessageCount, syncData.UpdatedAt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the minimum of (syncData.UpdatedAt, time.Now()) for time, so if times between components don't match we at least know timestamp is not in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I set time.Now to be the floor
execution/gethexec/sync_monitor.go
Outdated
} | ||
|
||
// Always add the max message count | ||
res["maxMessageCount"] = data.MaxMessageCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comes from consensus so let's call it "consensusMaxMessageCount" or something"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to "consensusMaxMessageCount"
defer h.mutex.RUnlock() | ||
|
||
if len(h.entries) == 0 { | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not entirely certain - but I think in this case it's better to return an error and not 0, to make sure nothing makes the mistake to think we're in sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay because we return not synced if it's zero:
func (s *SyncMonitor) Synced(ctx context.Context) bool {
...
// Calculate the sync target based on historical data
syncTarget := s.syncHistory.getSyncTarget(now)
if syncTarget == 0 {
// No valid sync target available yet
return false
}
...
- Simplify sync target to use oldest entry < msgLag ago (not 2*msgLag window) - Use min(now, syncData.UpdatedAt) to prevent future timestamps - Rename maxMessageCount to consensusMaxMessageCount for clarity - Update tests to match new msgLag-based trimming behavior
…nc-info Fix minor conflict around using pflag instead of flag.
Tests are passing now after merging in latest master, assigning back to Tsahi for review. |
This change eliminates a circular dependency between the consensus and
execution layers by transforming the sync status flow from a pull-based
to a push-based model. Previously, the execution layer would query the
consensus layer for sync status through the ConsensusInfo interface,
creating a tight coupling between the layers.
The new architecture introduces a ConsensusSyncData structure that
contains sync status, target message count, and progress information.
The ConsensusExecutionSyncer now periodically pushes this data from
consensus to execution, where it's stored using an atomic pointer for
lock-free reads. This approach maintains consistency with the existing
finality data push mechanism and provides better performance through
reduced lock contention.
As part of this refactoring, the ConsensusInfo interface has been
simplified to only include the BlockMetadataAtMessageIndex method,
removing the now-redundant Synced, FullSyncProgressMap, and
SyncTargetMessageCount methods. This cleaner separation of concerns
better supports alternative client implementations by clearly defining
the data flow boundaries between consensus and execution layers.
Fixes NIT-3649