Skip to content

Conversation

LeftHandCold
Copy link
Contributor

@LeftHandCold LeftHandCold commented Jul 9, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #22129

What this PR does / why we need it:

ForeachMVCCSpecificNodeInRange is used by "do checkpoint".
The purpose is to ensure that the checkpoint contains all data before the end timestamp.


PR Type

Bug fix


Description

  • Fix checkpoint timeout by adjusting MVCC node traversal logic

  • Modify checkpoint entry validation to require only 1 entry instead of 2

  • Add new ForeachMVCCSpecificNodeInRange method for checkpoint operations

  • Ensure checkpoint contains all data before end timestamp


Changes diagram

flowchart LR
  A["Checkpoint Entry Validation"] --> B["MVCC Node Traversal"]
  B --> C["ForeachMVCCSpecificNodeInRange"]
  C --> D["Checkpoint Data Collection"]
  D --> E["Snapshot Restore Fix"]
Loading

Changes walkthrough 📝

Relevant files
Bug fix
db.go
Relax checkpoint entry validation requirements                     

pkg/vm/engine/disttae/db.go

  • Change checkpoint entry validation from requiring 2 entries to 1
  • Modify end timestamp calculation to use last entry instead of
    penultimate
  • +2/-2     
    ckp_writer.go
    Update checkpoint collector traversal method                         

    pkg/vm/engine/tae/logtail/ckp_writer.go

  • Replace ForeachMVCCNodeInRange with ForeachMVCCSpecificNodeInRange
  • Update checkpoint collector to use new traversal method
  • +1/-1     
    Enhancement
    object.go
    Add specialized MVCC traversal for checkpoints                     

    pkg/vm/engine/tae/catalog/object.go

  • Add new ForeachMVCCSpecificNodeInRange method for checkpoint
    operations
  • Implement transaction waiting logic for uncommitted nodes
  • Add specific logic for appendable objects and delete node handling
  • +33/-0   

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Copy link

    qodo-merge-pro bot commented Jul 9, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🎫 Ticket compliance analysis ✅

    22129 - PR Code Verified

    Compliant requirements:

    • Fix restore from snapshot timeout issue
    • Address checkpoint-related timeout problems during snapshot operations

    Requires further human verification:

    • Ensure snapshot workflow completes successfully without timing out

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Logic Change

    The checkpoint validation logic changed from requiring 2 entries to 1 entry, and end timestamp calculation now uses the last entry instead of penultimate. This fundamental change in checkpoint validation needs careful verification to ensure it doesn't break existing functionality.

    if len(checkpointEntries) < 1 {
    	return false
    }
    
    // The end time of the penultimate checkpoint must not be less than the ts of the snapshot,
    // because the data of the snapshot may exist in the wal and be collected to the next checkpoint
    end := checkpointEntries[len(checkpointEntries)-1].GetEnd()
    Transaction Safety

    The new ForeachMVCCSpecificNodeInRange method includes transaction waiting logic and complex conditional checks for appendable objects and delete nodes. The transaction state handling and the various conditional branches need thorough validation to ensure correctness.

    func (entry *ObjectEntry) ForeachMVCCSpecificNodeInRange(start, end types.TS, f func(*txnbase.TxnMVCCNode) error) error {
    	needWait, txn := entry.GetLastMVCCNode().NeedWaitCommitting(end.Next())
    	if needWait {
    		txn.GetTxnState(true)
    	}
    
    	var createIn, deleteIn bool
    	createIn, _ = entry.CreateNode.PreparedIn(start, end)
    	if createIn {
    		if err := f(&entry.CreateNode); err != nil {
    			return err
    		}
    	}
    	deleteIn, _ = entry.DeleteNode.PreparedIn(start, end)
    	if !deleteIn {
    		if !entry.IsAppendable() {
    			return nil
    		}
    		if !createIn {
    			return nil
    		}
    		if !entry.DeleteNode.IsCommitted() {
    			return nil
    		}
    	}
    	if err := f(&entry.DeleteNode); err != nil {
    		return err
    	}
    	return nil

    Copy link

    qodo-merge-pro bot commented Jul 9, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Add nil check for transaction

    The code doesn't handle the case where txn might be nil when needWait is true.
    This could lead to a panic when calling txn.GetTxnState(true).

    pkg/vm/engine/tae/catalog/object.go [735-738]

     needWait, txn := entry.GetLastMVCCNode().NeedWaitCommitting(end.Next())
    -if needWait {
    +if needWait && txn != nil {
         txn.GetTxnState(true)
     }
    • Apply / Chat
    Suggestion importance[1-10]: 8

    __

    Why: The suggestion correctly identifies a potential nil pointer dereference that could cause a panic and provides a proper fix, improving the code's robustness.

    Medium
    • Update

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    kind/bug Something isn't working kind/enhancement Review effort 4/5 size/S Denotes a PR that changes [10,99] lines
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants