Skip to content

HDDS-11721. Fix container export failure when container scanner is running on a schema v2 container #8841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ptlrs
Copy link
Contributor

@ptlrs ptlrs commented Jul 21, 2025

What changes were proposed in this pull request?

Container export operations for Schema v2 containers can fail when a scan is simultaneously in progress. This occurs because the scanner maintains an active database (Db) handle, preventing the export process from clearing the Db handle cache.

Previously, the export process attempted to clear this cache (a feature introduced in HDDS-3363). However, with the Db handle still held by the scanner, this cache-clearing instruction would fail, leading to the overall export failure.

To address this, the export process has been updated to no longer attempt to clear the Db handle cache, allowing for successful container exports even when a scan is ongoing.

A unit test has been added which reproduces the failure conditions.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11721

How was this patch tested?

CI: https://github.com/ptlrs/ozone/actions/runs/16429140937

Flaky test 10x10 for:
TestKeyValueContainer
TestDecommissionAndMaintenance

@ptlrs ptlrs marked this pull request as draft July 22, 2025 18:04
@ptlrs
Copy link
Contributor Author

ptlrs commented Jul 22, 2025

Hi @errose28 @Tejaskriya @adoroszlai can you please review this PR?

@errose28 errose28 added the scanners Changes related to datanode container and volume scanners label Jul 22, 2025
@Tejaskriya Tejaskriya self-requested a review July 23, 2025 08:30
@errose28 errose28 self-requested a review July 23, 2025 18:45
@errose28 errose28 changed the title HDDS-11721. Fix container export failure when running a scanner HDDS-11721. Fix container export failure when container scanner is running on a schema v2 container Jul 23, 2025
Copy link
Contributor

@Tejaskriya Tejaskriya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch and the extensive test cases added @ptlrs !
I was just thinking if it is possible to introduce another utility method in BlockUtils that would try to clear out the cache, and ignore in case of failure. This way in the majority of cases where export+scanner is not happening simultaneously, we will be clearing the cache. But if they are happenening simultaneously, the cache clearing will fail, but the export operation itself wouldn't as we won't pass the exception out of the utility method introduced.
What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scanners Changes related to datanode container and volume scanners
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants