Skip to content

Conversation

Tristan-Wilson
Copy link
Member

This change eliminates a circular dependency between the consensus and
execution layers by transforming the sync status flow from a pull-based
to a push-based model. Previously, the execution layer would query the
consensus layer for sync status through the ConsensusInfo interface,
creating a tight coupling between the layers.

The new architecture introduces a ConsensusSyncData structure that
contains sync status, target message count, and progress information.
The ConsensusExecutionSyncer now periodically pushes this data from
consensus to execution, where it's stored using an atomic pointer for
lock-free reads. This approach maintains consistency with the existing
finality data push mechanism and provides better performance through
reduced lock contention.

As part of this refactoring, the ConsensusInfo interface has been
simplified to only include the BlockMetadataAtMessageIndex method,
removing the now-redundant Synced, FullSyncProgressMap, and
SyncTargetMessageCount methods. This cleaner separation of concerns
better supports alternative client implementations by clearly defining
the data flow boundaries between consensus and execution layers.

Fixes NIT-3649

This change eliminates a circular dependency between the consensus and
execution layers by transforming the sync status flow from a pull-based
to a push-based model. Previously, the execution layer would query the
consensus layer for sync status through the ConsensusInfo interface,
creating a tight coupling between the layers.

The new architecture introduces a ConsensusSyncData structure that
contains sync status, target message count, and progress information.
The ConsensusExecutionSyncer now periodically pushes this data from
consensus to execution, where it's stored using an atomic pointer for
lock-free reads. This approach maintains consistency with the existing
finality data push mechanism and provides better performance through
reduced lock contention.

As part of this refactoring, the ConsensusInfo interface has been
simplified to only include the BlockMetadataAtMessageIndex method,
removing the now-redundant Synced, FullSyncProgressMap, and
SyncTargetMessageCount methods. This cleaner separation of concerns
better supports alternative client implementations by clearly defining
the data flow boundaries between consensus and execution layers.
Also lower the sync interval for tests.
syncData := &execution.ConsensusSyncData{
Synced: c.syncMonitor.Synced(),
SyncTargetMessageCount: c.syncMonitor.SyncTargetMessageCount(),
SyncProgressMap: c.syncMonitor.FullSyncProgressMap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncProgressMap should be nil or empty if Synced (prevents wasteful locking/etc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in latest commit.

synced, err := s.consensus.Synced().Await(ctx)
if err != nil {
log.Error("Error checking if consensus is synced", "err", err)
data := s.consensusSyncData.Load()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more complex..
If you're pushing from consensus there is already a delay from the time the message you pushed until the time it is read.
I think:

  • ConsensusSyncData should not include SyncTarget, but "maxMessageCount" from consensus (synTarget is a d delayed maxMessageCount).
  • execution side should have a config of MsgLag
  • execution side should have it's own TargetMessage. Ideally, this would be the last MaxMessage it got from consensus more then MsgLag ago - but no more then 2MsgLAg ago (if no data arrived in the last 2MsgLAg the target doesn't matter - it's not in sync. If it has data only from within the last MsgLag - it should use the least recent one there)
  • execution side should say it's in sync only if:
    ** last syncStatus from consensus is at no more then MsgLag old
    ** last syncStatus from consensus says consensus is synced
    ** execution met the internal TargetMessage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in latest commit.

Tristan-Wilson and others added 3 commits August 29, 2025 14:25
Only populate SyncProgressMap when not synced.

MaxMessageCount is now a dedicated field that's always sent.

Fix stale sync targets caused by push delay. Instead of consensus
sending pre-calculated targets, it now sends raw MaxMessageCount.
Execution maintains a sliding window history and calculates its own
target using values from 1-2 MsgLag ago, properly accounting for the
push delay. The default push interval and execution message lag are both
1 second so they work together well.

Includes unit tests for the sliding window implementation.
Copy link

codecov bot commented Sep 5, 2025

Codecov Report

❌ Patch coverage is 91.01124% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 22.80%. Comparing base (cb86fca) to head (4756bfd).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3538      +/-   ##
==========================================
+ Coverage   22.70%   22.80%   +0.09%     
==========================================
  Files         388      388              
  Lines       58900    59016     +116     
==========================================
+ Hits        13375    13456      +81     
- Misses      43486    43517      +31     
- Partials     2039     2043       +4     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tsahee tsahee added the after-next-version This PR shouldn't be merged until after the next version is released label Sep 8, 2025
windowEnd := now.Add(-h.msgLag)

for _, entry := range h.entries {
if !entry.timestamp.Before(windowStart) && !entry.timestamp.After(windowEnd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look at the current description of msgLag - I think what we actually want is the oldest message that's less then MsgLag old (2*MsgLag is not relevant).
We can discard anything that's more then MsgLag old.
We can do other things with different documentation for MsgLag - but this method (which is different from what I said before) seems to fit current documentation and be simple enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it to get the oldest message that is newer than msgLag old.


// Add the max message count to history for sync target calculation
if syncData != nil && syncData.MaxMessageCount > 0 {
s.syncHistory.add(syncData.MaxMessageCount, syncData.UpdatedAt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the minimum of (syncData.UpdatedAt, time.Now()) for time, so if times between components don't match we at least know timestamp is not in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I set time.Now to be the floor

}

// Always add the max message count
res["maxMessageCount"] = data.MaxMessageCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comes from consensus so let's call it "consensusMaxMessageCount" or something"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to "consensusMaxMessageCount"

defer h.mutex.RUnlock()

if len(h.entries) == 0 {
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely certain - but I think in this case it's better to return an error and not 0, to make sure nothing makes the mistake to think we're in sync

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay because we return not synced if it's zero:

func (s *SyncMonitor) Synced(ctx context.Context) bool {
...
	// Calculate the sync target based on historical data
	syncTarget := s.syncHistory.getSyncTarget(now)
	if syncTarget == 0 {
		// No valid sync target available yet
		return false
	}
...

@tsahee tsahee removed the after-next-version This PR shouldn't be merged until after the next version is released label Sep 22, 2025
@tsahee tsahee assigned Tristan-Wilson and unassigned tsahee Sep 22, 2025
Tristan-Wilson and others added 4 commits September 23, 2025 12:51
- Simplify sync target to use oldest entry < msgLag ago (not 2*msgLag window)
- Use min(now, syncData.UpdatedAt) to prevent future timestamps
- Rename maxMessageCount to consensusMaxMessageCount for clarity
- Update tests to match new msgLag-based trimming behavior
…nc-info

Fix minor conflict around using pflag instead of flag.
@Tristan-Wilson
Copy link
Member Author

Tests are passing now after merging in latest master, assigning back to Tsahi for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants