HDDS-12655. Optimize Recon Container Mismatch API while getting containerInfos & ContainerMetadata. - ( OM Changes ) #8855
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The existing mismatch API was inefficient because it looked into the containerKeyPrefix table for all containers, which required examining many more keys than necessary. This was especially problematic when only checking for container presence rather than needing full container metadata. The original approach used complex dual-iterator logic with sorting and comparison operations, leading to poor performance and unnecessary deserialization.
Proposed Changes
The pull request removes the peekNextKey() method from the SeekableIterator interface and updates the ContainerMetadataIterator to remove peek functionality. The ContainerEndpoint logic was modified to use simple next() calls instead of peek operations, and the corresponding test method testContainerIteratorPeekNextKey() was removed.
The SCM case (missing in SCM - data loss scenario) was completely rewritten following reviewer feedback. The old approach used complex dual-iterator logic with sorting and comparison operations. The new approach loads all SCM containers into a HashMap for fast O(1) lookups, then iterates over OM containers using only the containerKeyCountTable to avoid unnecessary deserialization. This eliminates the need for complex iterator comparisons and seeking operations.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12655
How was this patch tested?
Unit tests