Skip to content

Conversation

tac0turtle
Copy link
Contributor

@tac0turtle tac0turtle commented Oct 8, 2025

Overview

Closes ##2274

Copy link
Contributor

github-actions bot commented Oct 8, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedOct 8, 2025, 8:05 PM

Copy link
Contributor

claude bot commented Oct 8, 2025

Claude finished @tac0turtle's task —— View job


Code Review Complete ✅

I've performed a comprehensive code review of the atomic writing implementation. This is an excellent refactoring that addresses a critical issue with non-atomic writes.


Summary

This PR successfully implements atomic batch writes for block data, state, and height updates. The new Batch interface and DefaultBatch implementation provide clean abstraction for atomic operations, preventing partial writes that could corrupt the store. The implementation is well-tested with strong coverage of edge cases.

Key Achievements:

  • ✅ Atomic operations for block + state + height updates
  • ✅ Comprehensive test coverage (306 new lines of tests)
  • ✅ Proper error handling and rollback logic
  • ✅ Clean abstraction with the Batch interface
  • ✅ Crash recovery support in executor

Critical Issues

1. ⚠️ Missing State Persistence in Executor Initialization (executor.go:207-216)

Location: block/internal/executing/executor.go:207-216

Issue: The removed batch.UpdateState(state) call means the initial genesis state is never persisted to disk during chain initialization. This could cause issues on restart.

// Current code (PROBLEMATIC):
batch, err := e.store.NewBatch(e.ctx)
if err != nil {
    return fmt.Errorf("failed to create batch for initialization: %w", err)
}
if err := batch.SetHeight(state.LastBlockHeight); err != nil {
    return fmt.Errorf("failed to set store height: %w", err)
}
// Missing: batch.UpdateState(state) 
if err := batch.Commit(); err != nil {
    return fmt.Errorf("failed to commit initial state: %w", err)
}

Impact: On a fresh chain initialization, the genesis state won't be persisted. If the node restarts before producing the first block, GetState() will fail.

Recommendation: Restore the batch.UpdateState(state) call. This aligns with julienrbrt's comment about keeping this deletion.


2. ⚠️ Execution Client Rollback Warning (executor.go:373-376)

Location: block/internal/executing/executor.go:373-376

Context: julienrbrt correctly notes that if execution client calls succeed but the batch commit fails, there's no automatic rollback of the execution client state.

Current Situation:

newState, err := e.applyBlock(e.ctx, header.Header, data)  // Calls exec.ExecuteTxs
if err != nil {
    return fmt.Errorf("failed to apply block: %w", err)
}
// ... later ...
if err := batch.Commit(); err != nil {  // If this fails, exec state is already mutated
    return fmt.Errorf("failed to commit block data: %w", err)
}

Recommendation: Add a follow-up task (as julienrbrt suggested) to:

  1. Detect when we're in this inconsistent state (exec applied but store commit failed)
  2. Instruct operators to manually rollback the execution client
  3. Consider adding rollback detection on restart

This is beyond the scope of this PR but should be tracked as a known limitation.


Major Issues

3. 🔴 Inconsistent Parameter Order in updateState Methods

Locations:

  • executor.go:630 - updateState(batch store.Batch, newState types.State)
  • syncer.go:579 - updateState(newState types.State, batch store.Batch)

Issue: The parameter order is reversed between executor and syncer, which is error-prone and inconsistent.

Recommendation: Standardize to updateState(batch store.Batch, newState types.State) in both files for consistency with Go conventions (dependencies first, data second).


4. 🟡 Context Mismatch in Crash Recovery Save (executor.go:359)

Location: block/internal/executing/executor.go:359

// saved early for crash recovery, will be overwritten later with the final signature
b, err := e.store.NewBatch(context.Background())  // ← Using Background instead of e.ctx

Issue: Uses context.Background() instead of e.ctx, which means:

  • Cancellation signals won't propagate
  • Trace/logging context is lost
  • Inconsistent with the rest of the function

Recommendation: Use e.ctx for consistency:

b, err := e.store.NewBatch(e.ctx)

Minor Issues & Improvements

5. Documentation: Batch Interface (types.go:10-29)

Location: pkg/store/types.go:10-29

Observation: The Batch interface is well-designed but lacks usage examples.

Suggestion: Add package-level doc comment explaining the atomicity guarantee:

// Batch provides atomic write operations for the store.
// All operations added to a batch are buffered until Commit() is called.
// If Commit() fails, none of the operations will be persisted.
// Example:
//   batch, err := store.NewBatch(ctx)
//   batch.SaveBlockData(header, data, sig)
//   batch.SetHeight(height)
//   batch.UpdateState(state)
//   batch.Commit()  // All or nothing

6. SetHeight Idempotency (batch.go:35-46)

Location: pkg/store/batch.go:35-46

func (b *DefaultBatch) SetHeight(height uint64) error {
    currentHeight, err := b.store.Height(b.ctx)
    if err != nil {
        return err
    }
    if height <= currentHeight {
        return nil  // Silently ignores attempts to set lower height
    }
    // ...
}

Observation: Silently ignoring height decreases could mask bugs. The comment doesn't explain the rationale.

Suggestion: Either:

  1. Return an error if height < currentHeight (height should only increase)
  2. Add a comment explaining why height == currentHeight is a no-op (likely for idempotency)

7. Syncer Logic Improvement (syncer.go:348-352)

Location: block/internal/syncing/syncer.go:348-352

// Process data (if not already processed by headers)
newDataHeight := s.dataBroadcaster.Store().Height()
if newDataHeight != newHeaderHeight && newDataHeight > currentHeight {
    s.p2pHandler.ProcessDataRange(s.ctx, currentHeight+1, newDataHeight, s.heightInCh)
}

Observation: julienrbrt's comment says "the logic was good" but the change adds newDataHeight != newHeaderHeight check.

Analysis: This is actually an improvement - it prevents duplicate processing when header and data heights are the same (already processed in the header range). The change is correct.


8. Removed Comment Should Be Restored (executor.go:400)

Location: block/internal/executing/executor.go:400

-	// Once the SaveBlockData has been saved we must update the height and the state.
 	if err := batch.SetHeight(newHeight); err != nil {

Observation: julienrbrt's comment indicates this should be deleted. However, the comment provides useful context about the ordering requirement.

Recommendation: Keep the comment deleted as suggested by the reviewer (julienrbrt). The code is self-explanatory with the batch pattern.


Code Quality & Architecture

✅ Strengths

  1. Clean Abstraction: The Batch interface is well-designed and follows the repository pattern.

  2. Comprehensive Testing: Excellent test coverage including:

    • Atomicity tests (TestBatchOperations)
    • Error handling tests (commit errors, put errors, batch creation errors)
    • Rollback tests with DA height validation
    • Edge cases (invalid data, missing metadata)
  3. Error Handling: Consistent error wrapping with context using fmt.Errorf and %w.

  4. Rollback Safety: The Rollback function properly validates DA included height for aggregators vs sync nodes, preventing dangerous rollbacks of finalized data.

  5. Backward Compatibility: Changes are backward compatible - the Store interface extends cleanly.


Security Considerations

✅ Passed

  • No exposure of sensitive data in logs
  • Proper validation of DA included height before rollback
  • Context propagation generally correct (except issue Fraud proofs gossiping #4)
  • No obvious injection vulnerabilities

Performance Considerations

⚡ Observations

  1. Batch Overhead: The batch abstraction adds a small overhead but provides critical atomicity guarantees - good tradeoff.

  2. Crash Recovery: The early save in executor (line 359) adds an extra commit but provides crash recovery - acceptable for production use.

  3. Concurrent Batches: Multiple independent batches can coexist (tested in TestBatchOperations), which is good for flexibility.

  4. No Unnecessary Allocations: The batch reuses the underlying datastore batch efficiently.


Testing Assessment

✅ Excellent Coverage

The test suite covers:

  • ✅ Happy path atomic commits
  • ✅ Batch creation errors
  • ✅ Put/commit errors during batch operations
  • ✅ Rollback edge cases (same height, higher height, DA constraints)
  • ✅ Invalid data handling
  • ✅ Concurrent batch usage
  • ✅ State persistence and retrieval
  • ✅ Metadata errors

Missing Tests:

  • Context cancellation during batch operations (minor)
  • Partial failure recovery scenarios (follow-up as noted)

Recommendations

Must Fix Before Merge

  1. Restore batch.UpdateState(state) in executor initialization (executor.go:207-216)
  2. Standardize updateState parameter order across executor and syncer

Should Fix

  1. Use e.ctx instead of context.Background() in crash recovery save (executor.go:359)

Consider for Follow-up

  1. Add detection and guidance for execution client rollback scenarios (as julienrbrt noted)
  2. Add documentation to Batch interface about atomicity guarantees
  3. Consider returning error for SetHeight with decreasing height

Reviewer Comments Addressed

Responding to julienrbrt's review comments:

  1. "update state isn't needed" (executor.go initialization) - DISAGREE: This state save IS needed for genesis persistence. Should be restored.

  2. "we do want to keep this one" (executor.go:351) - Confirmed, the deletion at line 400 is the one to keep.

  3. "this comment can be deleted too" - Agreed, comment at line 400 can be removed.

  4. "the logic was good" (syncer.go:348) - The new logic is actually better - prevents duplicate processing.

  5. Detection for execution client rollback - Excellent point for follow-up work.


Final Verdict

Status:Approve with Required Changes

This is a well-implemented, critical fix for store atomicity. The code quality is high, testing is comprehensive, and the architecture is sound. The issues identified are straightforward to fix.

Required before merge:

  • Fix executor initialization to persist initial state
  • Standardize parameter order in updateState methods

Recommended:

  • Fix context usage in crash recovery save
  • Track execution client rollback detection as follow-up work

@tac0turtle tac0turtle marked this pull request as ready for review October 8, 2025 19:31
@tac0turtle tac0turtle requested review from alpe and julienrbrt October 8, 2025 19:31
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It will solve a lot of issue for ev-node.
It would be good to add in a follow-up some detection to see if we are after some Execution client calls. If we are and if we fail to commit the batch afterwards, we should instruct people to rollback. As the execution client rollback is still manual

Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

let's wait until the base branch is merged to main, before merging this. or this should target main directly.

@tac0turtle
Copy link
Contributor Author

opening this against main

@tac0turtle tac0turtle closed this Oct 9, 2025
@github-project-automation github-project-automation bot moved this to Done in Evolve Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants