fix: making writing atomic #2745

tac0turtle · 2025-10-08T18:51:26Z

Overview

Closes ##2274

github-actions · 2025-10-08T18:51:41Z

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Oct 8, 2025, 8:05 PM

claude · 2025-10-08T18:51:41Z

Claude finished @tac0turtle's task —— View job

Code Review Complete ✅

I've performed a comprehensive code review of the atomic writing implementation. This is an excellent refactoring that addresses a critical issue with non-atomic writes.

Summary

This PR successfully implements atomic batch writes for block data, state, and height updates. The new Batch interface and DefaultBatch implementation provide clean abstraction for atomic operations, preventing partial writes that could corrupt the store. The implementation is well-tested with strong coverage of edge cases.

Key Achievements:

✅ Atomic operations for block + state + height updates
✅ Comprehensive test coverage (306 new lines of tests)
✅ Proper error handling and rollback logic
✅ Clean abstraction with the Batch interface
✅ Crash recovery support in executor

Critical Issues

1. ⚠️ Missing State Persistence in Executor Initialization (executor.go:207-216)

Location: block/internal/executing/executor.go:207-216

Issue: The removed batch.UpdateState(state) call means the initial genesis state is never persisted to disk during chain initialization. This could cause issues on restart.

// Current code (PROBLEMATIC):
batch, err := e.store.NewBatch(e.ctx)
if err != nil {
    return fmt.Errorf("failed to create batch for initialization: %w", err)
}
if err := batch.SetHeight(state.LastBlockHeight); err != nil {
    return fmt.Errorf("failed to set store height: %w", err)
}
// Missing: batch.UpdateState(state) 
if err := batch.Commit(); err != nil {
    return fmt.Errorf("failed to commit initial state: %w", err)
}

Impact: On a fresh chain initialization, the genesis state won't be persisted. If the node restarts before producing the first block, GetState() will fail.

Recommendation: Restore the batch.UpdateState(state) call. This aligns with julienrbrt's comment about keeping this deletion.

2. ⚠️ Execution Client Rollback Warning (executor.go:373-376)

Location: block/internal/executing/executor.go:373-376

Context: julienrbrt correctly notes that if execution client calls succeed but the batch commit fails, there's no automatic rollback of the execution client state.

Current Situation:

newState, err := e.applyBlock(e.ctx, header.Header, data)  // Calls exec.ExecuteTxs
if err != nil {
    return fmt.Errorf("failed to apply block: %w", err)
}
// ... later ...
if err := batch.Commit(); err != nil {  // If this fails, exec state is already mutated
    return fmt.Errorf("failed to commit block data: %w", err)
}

Recommendation: Add a follow-up task (as julienrbrt suggested) to:

Detect when we're in this inconsistent state (exec applied but store commit failed)
Instruct operators to manually rollback the execution client
Consider adding rollback detection on restart

This is beyond the scope of this PR but should be tracked as a known limitation.

Major Issues

3. 🔴 Inconsistent Parameter Order in updateState Methods

Locations:

executor.go:630 - updateState(batch store.Batch, newState types.State)
syncer.go:579 - updateState(newState types.State, batch store.Batch)

Issue: The parameter order is reversed between executor and syncer, which is error-prone and inconsistent.

Recommendation: Standardize to updateState(batch store.Batch, newState types.State) in both files for consistency with Go conventions (dependencies first, data second).

4. 🟡 Context Mismatch in Crash Recovery Save (executor.go:359)

Location: block/internal/executing/executor.go:359

// saved early for crash recovery, will be overwritten later with the final signature
b, err := e.store.NewBatch(context.Background())  // ← Using Background instead of e.ctx

Issue: Uses context.Background() instead of e.ctx, which means:

Cancellation signals won't propagate
Trace/logging context is lost
Inconsistent with the rest of the function

Recommendation: Use e.ctx for consistency:

b, err := e.store.NewBatch(e.ctx)

Minor Issues & Improvements

5. Documentation: Batch Interface (types.go:10-29)

Location: pkg/store/types.go:10-29

Observation: The Batch interface is well-designed but lacks usage examples.

Suggestion: Add package-level doc comment explaining the atomicity guarantee:

// Batch provides atomic write operations for the store.
// All operations added to a batch are buffered until Commit() is called.
// If Commit() fails, none of the operations will be persisted.
// Example:
//   batch, err := store.NewBatch(ctx)
//   batch.SaveBlockData(header, data, sig)
//   batch.SetHeight(height)
//   batch.UpdateState(state)
//   batch.Commit()  // All or nothing

6. SetHeight Idempotency (batch.go:35-46)

Location: pkg/store/batch.go:35-46

func (b *DefaultBatch) SetHeight(height uint64) error {
    currentHeight, err := b.store.Height(b.ctx)
    if err != nil {
        return err
    }
    if height <= currentHeight {
        return nil  // Silently ignores attempts to set lower height
    }
    // ...
}

Observation: Silently ignoring height decreases could mask bugs. The comment doesn't explain the rationale.

Suggestion: Either:

Return an error if height < currentHeight (height should only increase)
Add a comment explaining why height == currentHeight is a no-op (likely for idempotency)

7. Syncer Logic Improvement (syncer.go:348-352)

Location: block/internal/syncing/syncer.go:348-352

// Process data (if not already processed by headers)
newDataHeight := s.dataBroadcaster.Store().Height()
if newDataHeight != newHeaderHeight && newDataHeight > currentHeight {
    s.p2pHandler.ProcessDataRange(s.ctx, currentHeight+1, newDataHeight, s.heightInCh)
}

Observation: julienrbrt's comment says "the logic was good" but the change adds newDataHeight != newHeaderHeight check.

Analysis: This is actually an improvement - it prevents duplicate processing when header and data heights are the same (already processed in the header range). The change is correct.

8. Removed Comment Should Be Restored (executor.go:400)

Location: block/internal/executing/executor.go:400

-	// Once the SaveBlockData has been saved we must update the height and the state.
 	if err := batch.SetHeight(newHeight); err != nil {

Observation: julienrbrt's comment indicates this should be deleted. However, the comment provides useful context about the ordering requirement.

Recommendation: Keep the comment deleted as suggested by the reviewer (julienrbrt). The code is self-explanatory with the batch pattern.

Code Quality & Architecture

✅ Strengths

Clean Abstraction: The Batch interface is well-designed and follows the repository pattern.
Comprehensive Testing: Excellent test coverage including:
- Atomicity tests (TestBatchOperations)
- Error handling tests (commit errors, put errors, batch creation errors)
- Rollback tests with DA height validation
- Edge cases (invalid data, missing metadata)
Error Handling: Consistent error wrapping with context using fmt.Errorf and %w.
Rollback Safety: The Rollback function properly validates DA included height for aggregators vs sync nodes, preventing dangerous rollbacks of finalized data.
Backward Compatibility: Changes are backward compatible - the Store interface extends cleanly.

Security Considerations

✅ Passed

No exposure of sensitive data in logs
Proper validation of DA included height before rollback
Context propagation generally correct (except issue Fraud proofs gossiping #4)
No obvious injection vulnerabilities

Performance Considerations

⚡ Observations

Batch Overhead: The batch abstraction adds a small overhead but provides critical atomicity guarantees - good tradeoff.
Crash Recovery: The early save in executor (line 359) adds an extra commit but provides crash recovery - acceptable for production use.
Concurrent Batches: Multiple independent batches can coexist (tested in TestBatchOperations), which is good for flexibility.
No Unnecessary Allocations: The batch reuses the underlying datastore batch efficiently.

Testing Assessment

✅ Excellent Coverage

The test suite covers:

✅ Happy path atomic commits
✅ Batch creation errors
✅ Put/commit errors during batch operations
✅ Rollback edge cases (same height, higher height, DA constraints)
✅ Invalid data handling
✅ Concurrent batch usage
✅ State persistence and retrieval
✅ Metadata errors

Missing Tests:

Context cancellation during batch operations (minor)
Partial failure recovery scenarios (follow-up as noted)

Recommendations

Must Fix Before Merge

Restore batch.UpdateState(state) in executor initialization (executor.go:207-216)
Standardize updateState parameter order across executor and syncer

Should Fix

Use e.ctx instead of context.Background() in crash recovery save (executor.go:359)

Consider for Follow-up

Add detection and guidance for execution client rollback scenarios (as julienrbrt noted)
Add documentation to Batch interface about atomicity guarantees
Consider returning error for SetHeight with decreasing height

Reviewer Comments Addressed

Responding to julienrbrt's review comments:

✅ "update state isn't needed" (executor.go initialization) - DISAGREE: This state save IS needed for genesis persistence. Should be restored.
✅ "we do want to keep this one" (executor.go:351) - Confirmed, the deletion at line 400 is the one to keep.
✅ "this comment can be deleted too" - Agreed, comment at line 400 can be removed.
✅ "the logic was good" (syncer.go:348) - The new logic is actually better - prevents duplicate processing.
✅ Detection for execution client rollback - Excellent point for follow-up work.

Final Verdict

Status: ✅ Approve with Required Changes

This is a well-implemented, critical fix for store atomicity. The code quality is high, testing is comprehensive, and the architecture is sound. The issues identified are straightforward to fix.

Required before merge:

Fix executor initialization to persist initial state
Standardize parameter order in updateState methods

Recommended:

Fix context usage in crash recovery save
Track execution client rollback detection as follow-up work

julienrbrt

Nice! It will solve a lot of issue for ev-node.
It would be good to add in a follow-up some detection to see if we are after some Execution client calls. If we are and if we fail to commit the batch afterwards, we should instruct people to rollback. As the execution client rollback is still manual

block/internal/executing/executor.go

block/internal/syncing/syncer.go

julienrbrt

lgtm!

let's wait until the base branch is merged to main, before merging this. or this should target main directly.

tac0turtle · 2025-10-09T11:45:12Z

opening this against main

add batching

f709b7c

github-project-automation bot added this to Evolve Oct 8, 2025

github-actions bot assigned tac0turtle Oct 8, 2025

tac0turtle added 2 commits October 8, 2025 21:23

refactor to use batching struct

21e72b3

add back setheight

f0fa3c8

tac0turtle marked this pull request as ready for review October 8, 2025 19:31

Merge branch 'julien/sync-p2p' into marko/atomicity

f4049c7

tac0turtle requested review from alpe and julienrbrt October 8, 2025 19:31

julienrbrt reviewed Oct 8, 2025

View reviewed changes

tac0turtle added 2 commits October 8, 2025 21:58

add back some changes

6c3df24

remove comment

a23ea84

julienrbrt approved these changes Oct 8, 2025

View reviewed changes

address to hex instead of bytes

e11886a

tac0turtle closed this Oct 9, 2025

github-project-automation bot moved this to Done in Evolve Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: making writing atomic #2745

fix: making writing atomic #2745

Uh oh!

tac0turtle commented Oct 8, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

claude bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

julienrbrt left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julienrbrt left a comment

Uh oh!

tac0turtle commented Oct 9, 2025

Uh oh!

Uh oh!

fix: making writing atomic #2745

fix: making writing atomic #2745

Uh oh!

Conversation

tac0turtle commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

github-actions bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete ✅

Summary

Critical Issues

1. ⚠️ Missing State Persistence in Executor Initialization (executor.go:207-216)

2. ⚠️ Execution Client Rollback Warning (executor.go:373-376)

Major Issues

3. 🔴 Inconsistent Parameter Order in updateState Methods

4. 🟡 Context Mismatch in Crash Recovery Save (executor.go:359)

Minor Issues & Improvements

5. Documentation: Batch Interface (types.go:10-29)

6. SetHeight Idempotency (batch.go:35-46)

7. Syncer Logic Improvement (syncer.go:348-352)

8. Removed Comment Should Be Restored (executor.go:400)

Code Quality & Architecture

✅ Strengths

Security Considerations

✅ Passed

Performance Considerations

⚡ Observations

Testing Assessment

✅ Excellent Coverage

Recommendations

Must Fix Before Merge

Should Fix

Consider for Follow-up

Reviewer Comments Addressed

Final Verdict

Uh oh!

julienrbrt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julienrbrt left a comment

Choose a reason for hiding this comment

Uh oh!

tac0turtle commented Oct 9, 2025

Uh oh!

Uh oh!

tac0turtle commented Oct 8, 2025 •

edited

Loading

github-actions bot commented Oct 8, 2025 •

edited

Loading

claude bot commented Oct 8, 2025 •

edited

Loading