-
Notifications
You must be signed in to change notification settings - Fork 18
chain validation and fix command #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d0eb75b
to
7ea3dda
Compare
7ea3dda
to
bebe17e
Compare
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
8769ad1
to
0ffae31
Compare
WalkthroughThis update introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI (validateCmd)
participant RPC Client
participant Storage
participant ClickHouse
participant Validation
User->>CLI (validateCmd): Run `validate` with batch/fix args
CLI (validateCmd)->>RPC Client: Initialize RPC connection
CLI (validateCmd)->>Storage: Initialize storage connection
CLI (validateCmd)->>ClickHouse: Connect with optimized settings
CLI (validateCmd)->>Validation: Initialize Cursor (get last scanned/max block)
loop For each batch in block range
Validation->>ClickHouse: Remove duplicates in range
Validation->>ClickHouse: Detect and fix gaps in range
Validation->>ClickHouse: Fetch block data for range
Validation->>Validation: Validate blocks (logs bloom, tx root, etc.)
alt Invalid blocks found
Validation->>RPC Client: Repoll invalid blocks
Validation->>Storage: Stage and commit fixed blocks
end
Validation->>Validation: Update cursor with last validated block
end
CLI (validateCmd)->>User: Output validation/fix results
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🔭 Outside diff range comments (1)
internal/common/trace.go (1)
60-78
:⚠️ Potential issue
BlockTimestamp
is never serialized
TraceModel
includes aBlockTimestamp
field (uint64) butSerialize()
omits it. Downstream consumers will receive zero values.- BlockHash: t.BlockHash, + BlockHash: t.BlockHash, + BlockTimestamp: uint64(t.BlockTimestamp.Unix()),Please verify the desired unit (seconds vs milliseconds) and update tests accordingly.
♻️ Duplicate comments (2)
internal/validation/cursor.go (1)
37-50
: Consider making the cursor file location configurable.The current implementation writes cursor files to the working directory, which might be problematic in containerized environments with restricted filesystem access.
+// CursorFilePath is the directory where cursor files are stored +var CursorFilePath string = "." + func (c *Cursor) Update(blockNumber *big.Int) error { - cursorFile := fmt.Sprintf("validation_cursor_%s.json", c.ChainId.String()) + cursorFile := fmt.Sprintf("%s/validation_cursor_%s.json", CursorFilePath, c.ChainId.String())The same change would need to be applied to the
getLastScannedBlock
function.internal/validation/validation.go (1)
163-166
:isContractCreation
misses the “empty string” caseA contract-creation tx coming from
go-ethereum
hasTo == nil
, which your DB
may serialise as an empty string. The current check only treats the all-zero
address as creation:isContractCreation := tx.ToAddress == "0x0000…0000" && tx.Data != "0x"Consider:
isContractCreation := (tx.ToAddress == "" || - tx.ToAddress == "0x0000000000000000000000000000000000000000") && tx.Data != "0x" + tx.ToAddress == "0x0000000000000000000000000000000000000000") && + len(tx.Data) > 2 // guards against meaningless “0x”
🧹 Nitpick comments (5)
cmd/validate.go (2)
78-86
: Harden TLS configurationThe explicit
tls.Config{}
inherits Go’s default (TLS 1.2 for clients). For a CLI tool that only talks to a modern ClickHouse server you can safely raise the floor:- TLS: &tls.Config{}, + TLS: &tls.Config{ + MinVersion: tls.VersionTLS13, + },Not critical but improves security posture.
50-52
: Off-by-one adjustment is opaque
batchSize = batchSize - 1
relies on readers knowing the inclusive semantics later. Consider renaming the variable or adding a short comment to avoid future errors.// range end is inclusive, adjust once so we fetch exactly N blocks batchSizeInclusive := new(big.Int).Sub(batchSize, big.NewInt(1))internal/validation/validation.go (1)
46-54
: *Misleading debug message when printing big.Int values
log.Debug().Msgf("Processing batch of blocks %d to %d", …)
prints the decimal
representation of the pointers, not the block numbers, because%d
formats
a numeric value, not an object implementingStringer
.
Prefer%s
(or%v
) to leverage(*big.Int).String()
:- log.Debug().Msgf("Processing batch of blocks %d to %d", invalidBlocks[i], invalidBlocks[end-1]) + log.Debug().Msgf("Processing batch of blocks %s to %s", + invalidBlocks[i].String(), invalidBlocks[end-1].String())internal/validation/db.go (1)
120-124
: Potential goroutine leak if one query errors earlyEach goroutine writes exactly once to its channel, but if an early
return
occurs before the send, the receiver blocks forever. Although you currently
return
only after sending, future edits could break the guarantee.A safer pattern is to close the channel once the value is sent or use a
sync.WaitGroup
.
Example with a WaitGroup:var ( blocksResult blockResult … ) wg.Add(3) go func() { defer wg.Done(); blocksResult.blocks, blocksResult.err = … }() … wg.Wait()internal/validation/duplicates.go (1)
161-199
:removeDuplicateTransactions
may exceed ClickHouse parameter limitsThe batched deletion is great, but each loop aggregates all hashes in one
IN
clause. If a batch nears 1000 items, ClickHouse can hit the default
max_query_size
. Consider:
- Lowering
batchSize
(e.g., 100).- Or chunking
hashes
too: execute the same query multiple times with
smaller slices.This keeps memory and network utilisation predictable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (11)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/trace.go
(3 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/validation/cursor.go (1)
internal/storage/connector.go (1)
IStorage
(67-71)
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
cmd/validate.go
15-15: could not import github.com/thirdweb-dev/indexer/internal/validation (-: # github.com/thirdweb-dev/indexer/internal/validation
internal/validation/gaps.go:27:14: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
internal/validation/validation.go:56:15: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int))
(typecheck)
internal/validation/cursor.go
1-1: : # github.com/thirdweb-dev/indexer/internal/validation
internal/validation/gaps.go:27:14: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
internal/validation/validation.go:56:15: not enough arguments in call to poller.Poll
have ([]*big.Int)
want (context.Context, []*big.Int)
(typecheck)
🪛 GitHub Check: Test
internal/validation/validation.go
[failure] 56-56:
not enough arguments in call to poller.Poll
internal/validation/gaps.go
[failure] 27-27:
not enough arguments in call to poller.Poll
🪛 GitHub Actions: Go Unit Tests
internal/validation/gaps.go
[error] 27-27: not enough arguments in call to poller.Poll
🔇 Additional comments (12)
internal/validation/cursor.go (3)
12-16
: Well-structured cursor type for tracking validation progress.The
Cursor
struct provides a good foundation for tracking blockchain validation progress with appropriate fields for maintaining state across validation runs.
18-35
: LGTM - Good error handling and initialization logic.The
InitCursor
function properly initializes a cursor by retrieving the last scanned block and ensuring it doesn't exceed the maximum block number. The null check on line 24 formaxBlockNumber
provides good defensive programming.
52-72
: Good error handling for file operations.The implementation properly handles cases where the cursor file doesn't exist or contains invalid data by defaulting to block 0, allowing the validation to start from the beginning.
go.mod (3)
24-117
: Numerous indirect dependency updates.The PR includes many indirect dependency updates, including new CockroachDB packages, blockchain cryptography libraries, and Go standard library updates. These support the new validation functionality while keeping dependencies current.
7-10
:❓ Verification inconclusive
Significant dependency update for core Ethereum library.
The go-ethereum library has been updated from v1.14.8 to v1.15.11, which is a substantial version jump. Additionally,
holiman/uint256
has been moved from an indirect to a direct dependency, which aligns with the type changes in the serializer.
🏁 Script executed:
#!/bin/bash # Check for breaking changes in the updated go-ethereum version rg -A 5 -B 5 "github.com/ethereum/go-ethereum" --type goLength of output: 7911
Review go-ethereum v1.15.11 & holiman/uint256 Upgrade
The bump from v1.14.8→v1.15.11 spans multiple core packages and could introduce breaking changes to RPC clients, common types, trie handling, etc. We also promoted holiman/uint256 to a direct dependency to match updated serializers.
Please verify:
- All imports of
github.com/ethereum/go-ethereum
(shown below) still compile and behave as expected.- No changed function signatures or removed APIs in v1.15.x affect your code paths.
- The bump in holiman/uint256 aligns with any type conversions in serializers.
Affected files (imports of go-ethereum packages):
- internal/validation/validation.go
- internal/rpc/rpc.go
- internal/rpc/params.go
- internal/rpc/batcher.go
- internal/storage/clickhouse.go
- internal/middleware/authorization.go
- internal/common/transaction.go, abi.go, log.go
- internal/handlers/transactions_handlers.go, logs_handlers.go
- internal/common/log_test.go, transaction_test.go
Actions:
- Review the go-ethereum v1.15.11 release notes for breaking changes.
- Run the full test suite (unit + integration) against a node with v1.15.x.
- Ensure all ABI/account/rpc calls still serialize/deserialize correctly.
3-3
:❌ Incorrect review comment
Verify Go version compatibility.
The Go version has been updated to 1.23.0, which appears to be a future version since the current stable version at the time of this review is 1.22.x.
🌐 Web query:
What is the latest stable version of Go available in May 2025?
💡 Result:
As of May 22, 2025, the latest stable version of Go is 1.24.3, released on April 30, 2025. (go.dev) This release includes security fixes to the
os
package, as well as bug fixes to the runtime, compiler, linker,go
command, and thecrypto/tls
andos
packages. (golang.es)Citations:
- 1: https://go.dev/VERSION?utm_source=openai
- 2: https://golang.es/doc/devel/release?utm_source=openai
Incorrect Go version assumption in go.mod
The requirement
go 1.23.0
is not a future release. As of May 22, 2025 the latest stable Go is 1.24.3 (released April 30, 2025), and Go 1.23.0 was already available prior to that. Yourgo.mod
entry is valid.If you need fixes or features from the newest release, you may optionally bump to:
- go 1.23.0 + go 1.24• File: go.mod (line 3)
Likely an incorrect or invalid review comment.
cmd/root.go (1)
201-201
: Good integration of the new validation command.The new
validateCmd
is properly added to the root command, following the same pattern as existing commands.internal/rpc/serializer.go (2)
382-383
: Type optimization for gas values.Changing from
big.Int
touint64
for gas values is a good optimization as Ethereum gas values typically fit comfortably within uint64 range, reducing memory usage and improving performance.
402-411
:✅ Verification successful
Type change for trace addresses.
The type change from
uint64
toint64
for trace addresses indicates a design decision to allow for potential negative indices or to align with other systems' expectations.
🏁 Script executed:
#!/bin/bash # Check if there are corresponding changes in other files related to trace address types rg -A 3 -B 3 "TraceAddress.*\[\]int64" --type goLength of output: 1593
serializeTraceAddress type change is consistent
serializeTraceAddress now returns []int64, which matches the TraceAddress field defined as []int64 in internal/common/trace.go. No further updates are required.
- internal/common/trace.go:
TraceAddress []int64
confirmed.internal/storage/clickhouse.go (2)
375-377
: Gas fields handled correctly after type change – good job
trace.Gas
andtrace.GasUsed
are now inserted directly asuint64
, matching the new struct definition and saving an unnecessary allocation / dereference.
No further action needed.
936-938
: FINAL modifier ensures consistent readsAdding
FINAL
to the staging-data query is a good call – it prevents race conditions with background merges in ClickHouse.
👍internal/common/trace.go (1)
16-24
: Type migration looks consistentSwitching
TraceAddress
to[]int64
and gas fields touint64
simplifies marshaling and DB inserts. Just ensure callers no longer rely on signedness ofTraceAddress
(previouslyuint64
).
if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil { | ||
log.Error().Err(err).Msgf("Failed to delete staging data: %v", blocksData) | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Errors ignored after staging-data cleanup
DeleteStagingData
errors are logged but not propagated. If the cleanup fails the validation job silently continues, leaving orphaned rows and possibly breaking idempotency.
Consider returning the error to the caller so the outer command can decide whether to abort:
- if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil {
- log.Error().Err(err).Msgf("Failed to delete staging data: %v", blocksData)
- }
+ if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil {
+ return fmt.Errorf("failed to delete staging data: %w", err)
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil { | |
log.Error().Err(err).Msgf("Failed to delete staging data: %v", blocksData) | |
} | |
return nil | |
if err := s.StagingStorage.DeleteStagingData(blocksData); err != nil { | |
return fmt.Errorf("failed to delete staging data: %w", err) | |
} | |
return nil |
🤖 Prompt for AI Agents
In internal/validation/gaps.go around lines 43 to 46, the error returned by
DeleteStagingData is logged but not propagated, causing the validation job to
continue silently on failure. Modify the code to return the error from
DeleteStagingData instead of just logging it, so the caller can handle the
failure appropriately and maintain idempotency.
chainID, _ := uint256.FromBig(tx.ChainId) | ||
gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) | ||
gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) | ||
value, _ := uint256.FromBig(tx.Value) | ||
blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas) | ||
v, _ := uint256.FromBig(tx.V) | ||
r, _ := uint256.FromBig(tx.R) | ||
s, _ := uint256.FromBig(tx.S) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Nil-pointer risk when converting big.Int → uint256
uint256.FromBig
panics if the pointer is nil
. Fields such as
MaxPriorityFeePerGas
, MaxFeePerBlobGas
, Value
, R
, S
, etc. can be
NULL in the DB for older blocks or special transactions.
Add pre-checks or default zero values to avoid runtime panics:
func safeUint256(b *big.Int) *uint256.Int {
if b == nil {
return new(uint256.Int)
}
out, _ := uint256.FromBig(b)
return out
}
…and use safeUint256()
at each conversion point.
Also applies to: 332-338
*Handle nil big.Int inputs to uint256.FromBig to prevent panics
The fields tx.MaxPriorityFeePerGas
, tx.MaxFeePerBlobGas
, tx.Value
, tx.R
, tx.S
, etc. may be nil
(e.g. legacy or special transactions). Calling uint256.FromBig(nil)
will panic.
Introduce a helper in internal/validation/validation.go
:
func safeUint256(b *big.Int) *uint256.Int {
if b == nil {
return new(uint256.Int)
}
out, _ := uint256.FromBig(b)
return out
}
And replace the direct calls at both locations:
• Lines 286–294
• Lines 332–338
Example diff:
- chainID, _ := uint256.FromBig(tx.ChainId)
+ chainID := safeUint256(tx.ChainId)
- gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas)
+ gasTipCap := safeUint256(tx.MaxPriorityFeePerGas)
...and similarly for MaxFeePerGas, Value, MaxFeePerBlobGas, V, R, S...
🤖 Prompt for AI Agents
In internal/validation/validation.go around lines 286 to 294, the code calls
uint256.FromBig on fields like tx.MaxPriorityFeePerGas and tx.Value without
checking for nil, which can cause panics if these fields are nil. To fix this,
define a helper function safeUint256 that returns a zero uint256.Int if the
input *big.Int is nil, otherwise calls uint256.FromBig. Replace all direct calls
to uint256.FromBig with safeUint256 in this block and also in lines 332 to 338
to safely handle nil inputs and prevent runtime panics.
query := fmt.Sprintf(`WITH | ||
to_be_inserted AS ( | ||
SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, | ||
data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign | ||
FROM default.logs FINAL | ||
WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) | ||
) | ||
INSERT INTO logs ( | ||
chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, | ||
data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign | ||
) SELECT * from to_be_inserted | ||
`, strings.Join(tuples, ",")) | ||
|
||
err := conn.Exec(context.Background(), query, chainId, blockNumbers) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
String-built IN
clause risks gigantic queries & SQL-injection
fmt.Sprintf
embeds a comma-separated list of tuples directly into the SQL
string.
Drawbacks:
- Huge
duplicateLogs
sets can blow the server’s max query length. - Although values come from the DB, handcrafted strings bypass ClickHouse
query-parameter sanitisation.
Prefer passing an array parameter(s) to the driver:
- WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s)
+ WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (?)
Then supply []tuple
(ClickHouse supports arrays of tuples) as the third
parameter. If tuple arrays are awkward, batch the deletion with smaller,
parameterised queries like you did for transactions.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In internal/validation/duplicates.go around lines 218 to 232, the SQL query uses
fmt.Sprintf to embed a comma-separated list of tuples directly into the IN
clause, which risks creating excessively large queries and potential SQL
injection issues. To fix this, refactor the query to use parameterized array
inputs for the tuples instead of string-building the IN clause. Pass the tuples
as an array parameter to the query execution, leveraging ClickHouse's support
for arrays of tuples, or alternatively, batch the inserts with smaller
parameterized queries to avoid oversized queries and ensure proper sanitization.
0ffae31
to
31e733b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (3)
internal/validation/validation.go (2)
286-295
:⚠️ Potential issue*Handle nil big.Int inputs to uint256.FromBig to prevent panics
The code doesn't check for nil values before converting
big.Int
touint256.Int
, which could cause panics for older blocks or special transactions where some fields may be nil.Implement a helper function at the beginning of the file:
// safeUint256 converts a *big.Int to *uint256.Int safely, handling nil values func safeUint256(b *big.Int) *uint256.Int { if b == nil { return new(uint256.Int) } out, _ := uint256.FromBig(b) return out }Then replace all the conversions:
-chainID, _ := uint256.FromBig(tx.ChainId) -gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) -gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) -value, _ := uint256.FromBig(tx.Value) -blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas) -v, _ := uint256.FromBig(tx.V) -r, _ := uint256.FromBig(tx.R) -s, _ := uint256.FromBig(tx.S) +chainID := safeUint256(tx.ChainId) +gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) +gasFeeCap := safeUint256(tx.MaxFeePerGas) +value := safeUint256(tx.Value) +blobFeeCap := safeUint256(tx.MaxFeePerBlobGas) +v := safeUint256(tx.V) +r := safeUint256(tx.R) +s := safeUint256(tx.S)Also apply the same pattern to lines 332-338.
56-58
:⚠️ Potential issueCheck error return from poller.Poll
The
Poll
method likely returns an error that is not being checked. This could silently ignore failures in the polling process.poller := orchestrator.NewBoundlessPoller(rpcClient, s) -poller.Poll(context.Background(), batch) +if err := poller.Poll(context.Background(), batch); err != nil { + return fmt.Errorf("failed to poll blocks %v: %w", batch, err) +} log.Debug().Msgf("Batch of invalid blocks polled: %d to %d", invalidBlocks[i], invalidBlocks[end-1])internal/validation/duplicates.go (1)
213-229
: 🛠️ Refactor suggestionString-built
IN
clause risks gigantic queries & SQL-injectionBuilding the
IN
clause directly withfmt.Sprintf
could lead to oversized queries if there are many tuples, and potentially allows SQL injection vulnerabilities even though the values come from the database.Consider one of these approaches:
- Use ClickHouse array parameters instead:
-tuples := make([]string, 0, len(batch)) +txHashes := make([]string, 0, len(batch)) +logIndices := make([]uint64, 0, len(batch)) for _, duplicateLog := range batch { blockNumbers = append(blockNumbers, duplicateLog.BlockNumber) - tuples = append(tuples, fmt.Sprintf("('%s', %d)", duplicateLog.TxHash, duplicateLog.LogIndex)) + txHashes = append(txHashes, duplicateLog.TxHash) + logIndices = append(logIndices, duplicateLog.LogIndex) } -query := fmt.Sprintf(`WITH +query := `WITH to_be_inserted AS ( SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign FROM default.logs FINAL - WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) + WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN ZIP(?, ?) ) INSERT INTO logs ( chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign ) SELECT * from to_be_inserted -`, strings.Join(tuples, ",")) +` -err := conn.Exec(context.Background(), query, chainId, blockNumbers) +err := conn.Exec(context.Background(), query, chainId, blockNumbers, txHashes, logIndices)
🧹 Nitpick comments (3)
cmd/validate.go (1)
19-26
: Update the TBD command documentationThe command's
Short
andLong
descriptions are set to "TBD" which should be updated with meaningful descriptions to help users understand the purpose and functionality of the validate command.var ( validateCmd = &cobra.Command{ - Use: "validate", - Short: "TBD", - Long: "TBD", + Use: "validate [batchSize] [fixBatchSize]", + Short: "Validates and fixes blockchain data integrity", + Long: "Validates blockchain data integrity by detecting and removing duplicates, filling gaps, and verifying block data. Takes optional batch size (default: 1000) and fix batch size (default: 0) arguments.", Run: func(cmd *cobra.Command, args []string) { RunValidate(cmd, args) }, } )internal/validation/validation.go (2)
164-166
: Better contract creation detectionThe contract creation detection logic relies on checking for the zero address string and non-empty data, which could be error-prone. Consider using a more robust approach.
-isContractCreation := tx.ToAddress == "0x0000000000000000000000000000000000000000" && tx.Data != "0x" +// Contract creation transactions have empty "to" field, but in the database they might be +// stored with the zero address. Check for both empty address and zero address. +isContractCreation := (tx.ToAddress == "" || tx.ToAddress == "0x0000000000000000000000000000000000000000") && tx.Data != "0x"
98-104
: Add comments explaining transaction type checkingThe code skips transaction root validation for transaction types > 4, but doesn't explain why. Add comments to clarify which transaction types are supported and why others are skipped.
// TODO: remove this once we know how to validate all tx types for _, tx := range blockData.Transactions { - if tx.TransactionType > 4 { // Currently supported types are 0-4 + // Currently supported transaction types: + // 0: Legacy + // 1: Access List (EIP-2930) + // 2: Dynamic Fee (EIP-1559) + // 3: Blob (EIP-4844) + // 4: Set Code (EIP-7702) + // Skip validation for newer, unsupported transaction types + if tx.TransactionType > 4 { log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType) return nil } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (11)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/trace.go
(3 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- cmd/root.go
- internal/storage/clickhouse.go
- internal/rpc/serializer.go
- internal/common/trace.go
- internal/validation/cursor.go
- internal/validation/gaps.go
- go.mod
- internal/validation/db.go
🧰 Additional context used
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
internal/validation/duplicates.go
33-33: Error return value is not checked
(errcheck)
44-44: Error return value is not checked
(errcheck)
55-55: Error return value is not checked
(errcheck)
conn, err := clickhouse.Open(&clickhouse.Options{ | ||
Addr: []string{fmt.Sprintf("%s:%d", config.Cfg.Storage.Main.Clickhouse.Host, config.Cfg.Storage.Main.Clickhouse.Port)}, | ||
Protocol: clickhouse.Native, | ||
TLS: &tls.Config{}, | ||
Auth: clickhouse.Auth{ | ||
Username: config.Cfg.Storage.Main.Clickhouse.Username, | ||
Password: config.Cfg.Storage.Main.Clickhouse.Password, | ||
}, | ||
Settings: func() clickhouse.Settings { | ||
settings := clickhouse.Settings{ | ||
"do_not_merge_across_partitions_select_final": "1", | ||
"use_skip_indexes_if_final": "1", | ||
"optimize_move_to_prewhere_if_final": "1", | ||
"async_insert": "1", | ||
"wait_for_async_insert": "1", | ||
} | ||
return settings | ||
}(), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set TLS MinVersion for better security
The TLS configuration is missing a MinVersion
setting, which could allow connections with insecure TLS protocol versions.
TLS: &tls.Config{
+ MinVersion: tls.VersionTLS12,
},
Ideally, use TLS 1.3 if your server supports it:
TLS: &tls.Config{
+ MinVersion: tls.VersionTLS13,
},
🧰 Tools
🪛 ast-grep (0.38.1)
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🤖 Prompt for AI Agents
In cmd/validate.go around lines 69 to 87, the TLS configuration for the
Clickhouse connection lacks a MinVersion setting, which may allow insecure TLS
versions. Update the tls.Config struct to include MinVersion set to
tls.VersionTLS13 to enforce using TLS 1.3 if supported by the server, enhancing
connection security.
func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error { | ||
chainId := rpcClient.GetChainID() | ||
err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock) | ||
if err != nil { | ||
log.Fatal().Err(err).Msg("Failed to find and fix duplicates") | ||
} | ||
|
||
err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock) | ||
if err != nil { | ||
log.Fatal().Err(err).Msg("Failed to find and fix gaps") | ||
} | ||
|
||
err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize) | ||
if err != nil { | ||
log.Fatal().Err(err).Msg("Failed to validate and fix blocks") | ||
} | ||
|
||
log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling in validateAndFixRange
The function calls log.Fatal()
which terminates the program immediately without allowing cleanup or graceful shutdown. Consider returning errors instead and handling them at a higher level.
func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error {
chainId := rpcClient.GetChainID()
err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock)
if err != nil {
- log.Fatal().Err(err).Msg("Failed to find and fix duplicates")
+ return fmt.Errorf("failed to find and fix duplicates: %w", err)
}
err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock)
if err != nil {
- log.Fatal().Err(err).Msg("Failed to find and fix gaps")
+ return fmt.Errorf("failed to find and fix gaps: %w", err)
}
err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize)
if err != nil {
- log.Fatal().Err(err).Msg("Failed to validate and fix blocks")
+ return fmt.Errorf("failed to validate and fix blocks: %w", err)
}
log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock)
return nil
}
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cmd/validate.go around lines 115 to 133, the function validateAndFixRange
uses log.Fatal() on errors, which abruptly terminates the program. Modify the
function to return errors instead of calling log.Fatal(), allowing the caller to
handle errors gracefully and perform any necessary cleanup or shutdown
procedures.
removeDuplicateLogs(conn, chainId, duplicateLogs) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle error return value from removeDuplicateLogs
The error return value from removeDuplicateLogs
is not checked, which could silently ignore failures in the duplicate log removal process.
log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs)
-removeDuplicateLogs(conn, chainId, duplicateLogs)
+if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil {
+ return fmt.Errorf("failed to remove duplicate logs: %w", err)
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
removeDuplicateLogs(conn, chainId, duplicateLogs) | |
} | |
log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs) | |
if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil { | |
return fmt.Errorf("failed to remove duplicate logs: %w", err) | |
} | |
} |
🧰 Tools
🪛 golangci-lint (1.64.8)
55-55: Error return value is not checked
(errcheck)
🤖 Prompt for AI Agents
In internal/validation/duplicates.go around lines 55 to 56, the call to
removeDuplicateLogs returns an error that is currently not checked. Modify the
code to capture the error returned by removeDuplicateLogs and handle it
appropriately, such as logging the error or returning it up the call stack, to
ensure failures in the duplicate log removal process are not silently ignored.
removeDuplicateTransactions(conn, chainId, duplicateTransactions) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle error return value from removeDuplicateTransactions
The error return value from removeDuplicateTransactions
is not checked, which could silently ignore failures in the duplicate transaction removal process.
log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions)
-removeDuplicateTransactions(conn, chainId, duplicateTransactions)
+if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil {
+ return fmt.Errorf("failed to remove duplicate transactions: %w", err)
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
removeDuplicateTransactions(conn, chainId, duplicateTransactions) | |
} | |
log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions) | |
if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil { | |
return fmt.Errorf("failed to remove duplicate transactions: %w", err) | |
} | |
} |
🧰 Tools
🪛 golangci-lint (1.64.8)
44-44: Error return value is not checked
(errcheck)
🤖 Prompt for AI Agents
In internal/validation/duplicates.go around lines 44 to 45, the call to
removeDuplicateTransactions returns an error that is currently ignored. Modify
the code to capture this error and handle it appropriately, such as logging the
error or returning it up the call stack, to ensure failures in removing
duplicate transactions are not silently ignored.
removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle error return value from removeDuplicateBlocks
The error return value from removeDuplicateBlocks
is not checked, which could silently ignore failures in the duplicate block removal process.
log.Debug().Msgf("Found %d duplicate blocks in range %v-%v: %v", len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers)
-removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers)
+if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil {
+ return fmt.Errorf("failed to remove duplicate blocks: %w", err)
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers) | |
} | |
log.Debug().Msgf( | |
"Found %d duplicate blocks in range %v-%v: %v", | |
len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers, | |
) | |
if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil { | |
return fmt.Errorf("failed to remove duplicate blocks: %w", err) | |
} | |
} |
🧰 Tools
🪛 golangci-lint (1.64.8)
33-33: Error return value is not checked
(errcheck)
🤖 Prompt for AI Agents
In internal/validation/duplicates.go around lines 33 to 34, the call to
removeDuplicateBlocks returns an error that is currently ignored. Modify the
code to capture this error and handle it appropriately, such as logging the
error or returning it up the call stack, to ensure failures in duplicate block
removal are not silently ignored.
31e733b
to
0433596
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (8)
cmd/validate.go (2)
69-87
: Set TLS MinVersion for better securityThe TLS configuration is missing a
MinVersion
setting, which could allow connections with insecure TLS protocol versions. This was previously identified and should be addressed by addingMinVersion: tls.VersionTLS13
to enhance connection security.🧰 Tools
🪛 ast-grep (0.38.1)
[warning] 71-71: MinVersion
is missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures(missing-ssl-minversion-go)
115-133
: Improve error handling in validateAndFixRangeThe function uses
log.Fatal()
which terminates the program immediately without allowing cleanup or graceful shutdown. As previously suggested, consider returning errors instead and handling them at a higher level to allow for proper cleanup and graceful error handling.internal/validation/validation.go (2)
287-294
: *Handle nil big.Int inputs to uint256.FromBig to prevent panicsThe fields
tx.MaxPriorityFeePerGas
,tx.MaxFeePerBlobGas
,tx.Value
,tx.R
,tx.S
, etc. may benil
for certain transaction types. Callinguint256.FromBig(nil)
will panic. A helper functionsafeUint256
should be introduced to handle nil inputs safely as previously suggested.
332-338
: *Handle nil big.Int inputs to uint256.FromBig to prevent panicsSame issue as lines 287-294 - the conversion to uint256 needs null-safety to prevent runtime panics when big.Int fields are nil.
internal/validation/duplicates.go (4)
32-34
: Error fromremoveDuplicateBlocks
is silently ignoredThe call to
removeDuplicateBlocks
does not check the returned error.
If the deduplication query fails you will continue as if the data were clean, potentially cascading bad state into gap-filling and validation steps.- removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers) + if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil { + return fmt.Errorf("failed to remove duplicate blocks: %w", err) + }🧰 Tools
🪛 golangci-lint (1.64.8)
33-33: Error return value is not checked
(errcheck)
43-45
: Error fromremoveDuplicateTransactions
is silently ignoredSame issue as above for transaction duplicates. Propagate or at least log the failure so that the caller can abort or retry.
- removeDuplicateTransactions(conn, chainId, duplicateTransactions) + if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil { + return fmt.Errorf("failed to remove duplicate transactions: %w", err) + }🧰 Tools
🪛 golangci-lint (1.64.8)
44-44: Error return value is not checked
(errcheck)
54-56
: Error fromremoveDuplicateLogs
is silently ignoredPlease handle the error to avoid continuing with an inconsistent database state.
- removeDuplicateLogs(conn, chainId, duplicateLogs) + if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil { + return fmt.Errorf("failed to remove duplicate logs: %w", err) + }🧰 Tools
🪛 golangci-lint (1.64.8)
55-55: Error return value is not checked
(errcheck)
218-230
: String-built tuple list risks gigantic queries & SQL-injection
fmt.Sprintf
concatenates every(txHash, logIndex)
tuple straight into the SQL.
Drawbacks:
- Large duplicate batches can blow ClickHouse’s max query length.
- Although data originates from the DB, handcrafted SQL bypasses driver-level sanitisation.
Prefer parameterised arrays or smaller, parameterised batches, e.g.:
- WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) + WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (?)Then pass
[]tuple(String, UInt64)
as the third argument, or fall back to the existing 1 000-row batching but build the tuples slice as a proper driver parameter instead of raw SQL.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (12)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/trace.go
(3 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- cmd/root.go
🚧 Files skipped from review as they are similar to previous changes (6)
- internal/validation/gaps.go
- internal/common/trace.go
- internal/validation/cursor.go
- internal/rpc/serializer.go
- internal/storage/clickhouse.go
- go.mod
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/validation/db.go (1)
internal/storage/clickhouse.go (1)
ZERO_BYTES_42
(35-35)
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
internal/validation/duplicates.go
33-33: Error return value is not checked
(errcheck)
44-44: Error return value is not checked
(errcheck)
55-55: Error return value is not checked
(errcheck)
🔇 Additional comments (8)
internal/rpc/rpc.go (1)
111-114
: LGTM! Improves consistency between initialization functions.The addition of chain ID initialization to
InitializeSimpleRPCWithUrl
ensures consistency with the mainInitialize()
function and provides the necessary chain ID context for validation workflows.cmd/validate.go (1)
29-51
: LGTM! Well-structured argument parsing and validation.The command setup with batch size configuration and argument validation is well implemented. The inclusive range adjustment (line 51) is a nice touch for user-friendly batch processing.
internal/validation/validation.go (3)
164-165
: LGTM! Improved contract creation handling.The contract creation detection using
isContractCreation := tx.ToAddress == ""
and subsequent conditional address handling properly addresses the previous feedback about nilTo
addresses in contract creation transactions.
118-151
: LGTM! Robust logs bloom calculation.The logs bloom calculation correctly converts internal log structures to go-ethereum types and handles optional topics properly. The use of go-ethereum's
CreateBloom
ensures correctness.
56-57
:✅ Verification successful
Verify poller.Poll method signature and error handling
The call to
poller.Poll(context.Background(), batch)
needs verification to ensure it matches the expected signature and properly handles returned errors.
🏁 Script executed:
#!/bin/bash # Check the Poll method signature in the orchestrator package ast-grep --pattern 'func $_($$$) Poll($$$) $$$'Length of output: 49
🏁 Script executed:
#!/bin/bash # Locate any Go files in the orchestrator package pkg_files=$(rg -l "^package orchestrator" -n) # Print out where NewBoundlessPoller is defined echo "🔍 Searching for NewBoundlessPoller definitions..." rg -n "func NewBoundlessPoller" -n $pkg_files # Search for the BoundlessPoller type echo "🔍 Locating type BoundlessPoller..." file=$(rg -l "^type BoundlessPoller" -n $pkg_files) echo "Found BoundlessPoller in: $file" # Inspect the Poll method signature on BoundlessPoller echo "🔍 Inspecting Poll method signature..." rg -n "^func .*BoundlessPoller\\) Poll" -n "$file"Length of output: 1605
🏁 Script executed:
#!/bin/bash # Locate the Poller type definition echo "🔍 Searching for the Poller type in poller.go..." rg -n "^type Poller" internal/orchestrator/poller.go # Inspect the Poll method signature on Poller echo "🔍 Inspecting the Poll method signature..." rg -n "^func .*Poll" internal/orchestrator/poller.goLength of output: 1141
Signature verified:
Poll
returns only a block number, no error to handle
- internal/orchestrator/poller.go:149 defines
func (p *Poller) Poll(ctx context.Context, blockNumbers []*big.Int) (lastPolledBlock *big.Int)
- There is no error return, so no error handling is required here.
- If you need the returned
lastPolledBlock
, assign it; otherwise discarding the value is acceptable.internal/validation/db.go (3)
64-153
: LGTM! Excellent concurrent data fetching design.The concurrent fetching of blocks, logs, and transactions using goroutines and channels is well-implemented. The error handling, result organization by block number, and final assembly into
BlockData
structs demonstrates good engineering practices.
257-259
: LGTM! Proper handling of contract creation addresses.The special case handling for
ZERO_BYTES_42
correctly converts the zero-byte representation to an empty string for contract creation transactions, which aligns with the constant defined ininternal/storage/clickhouse.go
.
156-165
: LGTM! Optimized ClickHouse queries.The use of
FINAL
clause in the SELECT queries ensures data consistency when reading from ClickHouse tables, which is appropriate for validation operations that need the most recent, deduplicated data.
query := `SELECT block_number | ||
FROM default.blocks FINAL WHERE chain_id = ? AND block_number >= ? AND block_number <= ? | ||
GROUP BY block_number | ||
HAVING sum(sign) != 1 | ||
ORDER BY block_number; | ||
` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
HAVING sum(sign) != 1
may miss certain duplicate scenarios
If rows exist with signs (+1, +1, -1)
the sum equals +1
, therefore the group is not flagged, yet you still have two live (+1) rows – a duplicate situation.
Using count() > 1
(or abs(sum(sign)) != 1
) is safer.
HAVING count() > 1 -- catches any group with more than one live row
Consider adjusting the logic or at least adding a comment clarifying the assumption that “duplicate cancellation” never yields +1
with more than one positive row.
🤖 Prompt for AI Agents
In internal/validation/duplicates.go around lines 62 to 67, the current SQL
query uses HAVING sum(sign) != 1, which can miss duplicate cases like (+1, +1,
-1) where the sum is 1 but duplicates exist. To fix this, change the HAVING
clause to use count() > 1 or abs(sum(sign)) != 1 to correctly identify
duplicates. Alternatively, add a comment explaining the assumption behind the
current logic if you choose not to change it.
0433596
to
4b567ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
cmd/validate.go (2)
69-87
: TLS configuration missing MinVersion setting.This security issue has already been identified in previous reviews. The TLS configuration should specify a minimum version for enhanced security.
🧰 Tools
🪛 ast-grep (0.38.1)
[warning] 71-71: MinVersion
is missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures(missing-ssl-minversion-go)
115-134
: Error handling uses log.Fatal() preventing graceful shutdown.This error handling issue has already been identified in previous reviews. The function should return errors instead of calling log.Fatal() to allow proper cleanup and graceful error handling.
internal/validation/validation.go (1)
286-296
: *Handle nil big.Int inputs to prevent panics.This nil pointer risk with uint256.FromBig() has already been identified in previous reviews. Fields like MaxPriorityFeePerGas, MaxFeePerBlobGas, Value, R, S can be nil and need safe handling to prevent runtime panics.
Also applies to: 332-340
internal/validation/duplicates.go (3)
33-34
: Error return values not handled.The calls to
removeDuplicateBlocks
,removeDuplicateTransactions
, andremoveDuplicateLogs
return errors that are not being checked, as already identified in previous reviews. These should be handled to prevent silent failures.Also applies to: 44-45, 55-56
🧰 Tools
🪛 golangci-lint (1.64.8)
33-33: Error return value is not checked
(errcheck)
62-67
: HAVING clause may miss certain duplicate scenarios.The
HAVING sum(sign) != 1
condition has already been identified as potentially problematic in previous reviews, as it could miss cases where duplicates exist but sum to 1.
218-232
: SQL injection risk with string interpolation.The use of
fmt.Sprintf
to build the IN clause has already been identified as a security risk in previous reviews. This should use parameterized queries instead.
🧹 Nitpick comments (1)
internal/validation/validation.go (1)
97-103
: Consider removing or refining the transaction type check.The current check skips validation for transaction types > 4, but this hardcoded limit may become outdated as new transaction types are introduced. Consider either:
- Making this configurable
- Adding a more explicit comment about which types are supported
- Implementing a more robust unsupported type detection
- for _, tx := range blockData.Transactions { - if tx.TransactionType > 4 { // Currently supported types are 0-4 - log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType) - return nil - } - } + supportedTypes := map[uint8]bool{0: true, 1: true, 2: true, 3: true, 4: true} + for _, tx := range blockData.Transactions { + if !supportedTypes[tx.TransactionType] { + log.Warn().Msgf("Skipping transaction root validation for block %s due to unsupported transaction type %d", blockData.Block.Number, tx.TransactionType) + return nil + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (13)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- internal/common/abi.go
🚧 Files skipped from review as they are similar to previous changes (9)
- cmd/root.go
- internal/rpc/rpc.go
- internal/storage/clickhouse.go
- internal/rpc/serializer.go
- internal/validation/cursor.go
- internal/validation/gaps.go
- internal/common/trace.go
- go.mod
- internal/validation/db.go
🧰 Additional context used
🪛 golangci-lint (1.64.8)
internal/validation/duplicates.go
33-33: Error return value is not checked
(errcheck)
44-44: Error return value is not checked
(errcheck)
55-55: Error return value is not checked
(errcheck)
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
4b567ef
to
734272c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (9)
cmd/validate.go (2)
69-87
:⚠️ Potential issueSet TLS MinVersion for enhanced security.
The TLS configuration lacks a
MinVersion
setting, potentially allowing insecure TLS protocol versions. This was flagged in previous reviews and by static analysis.conn, err := clickhouse.Open(&clickhouse.Options{ Addr: []string{fmt.Sprintf("%s:%d", config.Cfg.Storage.Main.Clickhouse.Host, config.Cfg.Storage.Main.Clickhouse.Port)}, Protocol: clickhouse.Native, - TLS: &tls.Config{}, + TLS: &tls.Config{ + MinVersion: tls.VersionTLS12, + }, Auth: clickhouse.Auth{ Username: config.Cfg.Storage.Main.Clickhouse.Username, Password: config.Cfg.Storage.Main.Clickhouse.Password, },🧰 Tools
🪛 ast-grep (0.38.1)
[warning] 71-71: MinVersion
is missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures(missing-ssl-minversion-go)
115-133
: 🛠️ Refactor suggestionReplace log.Fatal() with proper error handling.
The function uses
log.Fatal()
which terminates the program immediately, preventing graceful error handling and cleanup. This was flagged in previous reviews.-func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error { +func validateAndFixRange(rpcClient rpc.IRPCClient, s storage.IStorage, conn clickhouse.Conn, startBlock *big.Int, endBlock *big.Int, fixBatchSize int) error { chainId := rpcClient.GetChainID() err := validation.FindAndRemoveDuplicates(conn, chainId, startBlock, endBlock) if err != nil { - log.Fatal().Err(err).Msg("Failed to find and fix duplicates") + return fmt.Errorf("failed to find and fix duplicates: %w", err) } err = validation.FindAndFixGaps(rpcClient, s, conn, chainId, startBlock, endBlock) if err != nil { - log.Fatal().Err(err).Msg("Failed to find and fix gaps") + return fmt.Errorf("failed to find and fix gaps: %w", err) } err = validation.ValidateAndFixBlocks(rpcClient, s, conn, startBlock, endBlock, fixBatchSize) if err != nil { - log.Fatal().Err(err).Msg("Failed to validate and fix blocks") + return fmt.Errorf("failed to validate and fix blocks: %w", err) } log.Debug().Msgf("Validation complete for range %v-%v", startBlock, endBlock) return nil }internal/validation/validation.go (3)
288-296
:⚠️ Potential issue*Prevent panics from nil big.Int inputs to uint256.FromBig.
Fields like
tx.MaxPriorityFeePerGas
,tx.MaxFeePerBlobGas
,tx.Value
, etc. may benil
for certain transaction types, causinguint256.FromBig(nil)
to panic.Add a helper function and use it for safe conversions:
+func safeUint256(b *big.Int) *uint256.Int { + if b == nil { + return new(uint256.Int) + } + out, _ := uint256.FromBig(b) + return out +}Then replace the direct calls:
- chainID, _ := uint256.FromBig(tx.ChainId) - gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) - gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) - value, _ := uint256.FromBig(tx.Value) - blobFeeCap, _ := uint256.FromBig(tx.MaxFeePerBlobGas) - v, _ := uint256.FromBig(tx.V) - r, _ := uint256.FromBig(tx.R) - s, _ := uint256.FromBig(tx.S) + chainID := safeUint256(tx.ChainId) + gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) + gasFeeCap := safeUint256(tx.MaxFeePerGas) + value := safeUint256(tx.Value) + blobFeeCap := safeUint256(tx.MaxFeePerBlobGas) + v := safeUint256(tx.V) + r := safeUint256(tx.R) + s := safeUint256(tx.S)
333-340
:⚠️ Potential issueApply the same nil-safety fix for EIP-7702 transactions.
The same nil pointer risk exists here for the EIP-7702 transaction type conversion.
- chainID, _ := uint256.FromBig(tx.ChainId) - gasTipCap, _ := uint256.FromBig(tx.MaxPriorityFeePerGas) - gasFeeCap, _ := uint256.FromBig(tx.MaxFeePerGas) - value, _ := uint256.FromBig(tx.Value) - v, _ := uint256.FromBig(tx.V) - r, _ := uint256.FromBig(tx.R) - s, _ := uint256.FromBig(tx.S) + chainID := safeUint256(tx.ChainId) + gasTipCap := safeUint256(tx.MaxPriorityFeePerGas) + gasFeeCap := safeUint256(tx.MaxFeePerGas) + value := safeUint256(tx.Value) + v := safeUint256(tx.V) + r := safeUint256(tx.R) + s := safeUint256(tx.S)
56-58
:⚠️ Potential issueHandle error return from poller.Poll.
The error returned by
poller.Poll
is not checked, which could silently ignore polling failures for invalid blocks.poller := orchestrator.NewBoundlessPoller(rpcClient, s) -poller.Poll(context.Background(), batch) +if err := poller.Poll(context.Background(), batch); err != nil { + log.Error().Err(err).Msgf("Failed to poll batch of blocks %s-%s", batch[0], batch[len(batch)-1]) + return fmt.Errorf("failed to poll invalid blocks: %w", err) +}internal/validation/duplicates.go (4)
33-34
:⚠️ Potential issueHandle error return from removeDuplicateBlocks.
The error returned by
removeDuplicateBlocks
is not checked, potentially ignoring failures in duplicate removal.log.Debug().Msgf("Found %d duplicate blocks in range %v-%v: %v", len(duplicateBlockNumbers), startBlock, endBlock, duplicateBlockNumbers) -removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers) +if err := removeDuplicateBlocks(conn, chainId, duplicateBlockNumbers); err != nil { + return fmt.Errorf("failed to remove duplicate blocks: %w", err) +}🧰 Tools
🪛 golangci-lint (1.64.8)
33-33: Error return value is not checked
(errcheck)
44-45
:⚠️ Potential issueHandle error return from removeDuplicateTransactions.
The error returned by
removeDuplicateTransactions
is not checked, potentially ignoring failures in duplicate removal.log.Debug().Msgf("Found %d duplicate transactions in range %v-%v: %v", len(duplicateTransactions), startBlock, endBlock, duplicateTransactions) -removeDuplicateTransactions(conn, chainId, duplicateTransactions) +if err := removeDuplicateTransactions(conn, chainId, duplicateTransactions); err != nil { + return fmt.Errorf("failed to remove duplicate transactions: %w", err) +}🧰 Tools
🪛 golangci-lint (1.64.8)
44-44: Error return value is not checked
(errcheck)
55-56
:⚠️ Potential issueHandle error return from removeDuplicateLogs.
The error returned by
removeDuplicateLogs
is not checked, potentially ignoring failures in duplicate removal.log.Debug().Msgf("Found %d duplicate logs in range %v-%v: %v", len(duplicateLogs), startBlock, endBlock, duplicateLogs) -removeDuplicateLogs(conn, chainId, duplicateLogs) +if err := removeDuplicateLogs(conn, chainId, duplicateLogs); err != nil { + return fmt.Errorf("failed to remove duplicate logs: %w", err) +}🧰 Tools
🪛 golangci-lint (1.64.8)
55-55: Error return value is not checked
(errcheck)
218-229
: 🛠️ Refactor suggestionAvoid string interpolation in SQL queries for security.
The SQL query uses
fmt.Sprintf
to embed tuples directly, which poses security risks and could create oversized queries.Consider using parameterized queries or batching with smaller queries. If ClickHouse supports tuple arrays as parameters, use those instead:
- query := fmt.Sprintf(`WITH - to_be_inserted AS ( - SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, - data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign - FROM default.logs FINAL - WHERE chain_id = ? AND block_number IN (?) AND (transaction_hash, log_index) IN (%s) - ) - INSERT INTO logs ( - chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, - data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign - ) SELECT * from to_be_inserted - `, strings.Join(tuples, ",")) + // Alternative approach: process each log individually or use smaller batches + for _, duplicateLog := range batch { + query := `WITH + to_be_inserted AS ( + SELECT chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, + data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, -sign as sign + FROM default.logs FINAL + WHERE chain_id = ? AND block_number = ? AND transaction_hash = ? AND log_index = ? + ) + INSERT INTO logs ( + chain_id, block_number, block_hash, block_timestamp, transaction_hash, transaction_index, log_index, address, + data, topic_0, topic_1, topic_2, topic_3, insert_timestamp, sign + ) SELECT * from to_be_inserted` + + err := conn.Exec(context.Background(), query, chainId, duplicateLog.BlockNumber, duplicateLog.TxHash, duplicateLog.LogIndex) + if err != nil { + return err + } + }
🧹 Nitpick comments (1)
cmd/validate.go (1)
19-26
: Complete the command descriptions.The
Short
andLong
descriptions are marked as "TBD". Please provide meaningful descriptions for the validate command to improve CLI usability.var ( validateCmd = &cobra.Command{ Use: "validate", - Short: "TBD", - Long: "TBD", + Short: "Validate and fix blockchain data integrity", + Long: `Validate and fix blockchain data integrity by checking for duplicates, gaps, and block validation errors. Supports custom batch sizes for processing.`, Run: func(cmd *cobra.Command, args []string) { RunValidate(cmd, args) }, } )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (13)
cmd/root.go
(1 hunks)cmd/validate.go
(1 hunks)go.mod
(5 hunks)internal/common/abi.go
(2 hunks)internal/common/trace.go
(3 hunks)internal/rpc/rpc.go
(1 hunks)internal/rpc/serializer.go
(2 hunks)internal/storage/clickhouse.go
(4 hunks)internal/validation/cursor.go
(1 hunks)internal/validation/db.go
(1 hunks)internal/validation/duplicates.go
(1 hunks)internal/validation/gaps.go
(1 hunks)internal/validation/validation.go
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- internal/common/abi.go
🚧 Files skipped from review as they are similar to previous changes (9)
- internal/rpc/rpc.go
- cmd/root.go
- internal/rpc/serializer.go
- internal/validation/gaps.go
- internal/validation/cursor.go
- internal/common/trace.go
- internal/storage/clickhouse.go
- go.mod
- internal/validation/db.go
🧰 Additional context used
🪛 ast-grep (0.38.1)
cmd/validate.go
[warning] 71-71: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. Add
MinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🪛 golangci-lint (1.64.8)
internal/validation/duplicates.go
33-33: Error return value is not checked
(errcheck)
44-44: Error return value is not checked
(errcheck)
55-55: Error return value is not checked
(errcheck)
TL;DR
Added a new
validate
command to the indexer that verifies and fixes data integrity issues in the blockchain data.What changed?
validate
command to the CLI that performs data validation and repairFindAndRemoveDuplicates
: Identifies and removes duplicate blocks, transactions, and logsFindAndFixGaps
: Detects and fills in missing blocks in a specified rangeValidateAndFixBlocks
: Verifies block integrity by checking transaction counts, logs bloom filters, and transaction rootsHow to test?
Run the validate command with optional batch size and fix batch size parameters:
Why make this change?
This validation tool helps maintain data integrity in the blockchain indexer by:
These validation capabilities are essential for ensuring the reliability and accuracy of the indexed blockchain data.
Summary by CodeRabbit
New Features
Bug Fixes
Dependency Updates