Skip to content

HADOOP-19569. S3A: stream write/close fails badly once FS is closed #7700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

steveloughran
Copy link
Contributor

HADOOP-19569.

Executors in hadoop-common to

  • pick up shutdown of inner executor and shut themselves down.
  • semaphore executor to decrement counters in this process so that queue state is updated This stops callers being able to submit work when the inner executor has shut down.

S3A code

  • StoreImpl to IllegalStateException on method invocation whene the service isn't running. Some methods are kept open as they do seem needed.
  • WriteOperationHelper callbacks to raise IllegalStateException when invoked when FS is closed.

This is complex.

TODO:

  • WriteOperationHelper MUST make all calls to the FS through its callback interface, rather than given a ref to S3AFS. This makes it easy to identify and lock down the methods.
  • What is the correct exception to raise in write/close() failures? IOE or illegal state?

How was this patch tested?

New ITests which close the FS while simple and multipart writes are in progress.

S3 london.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

Executors in hadoop-common to
 - pick up shutdown of inner executor and shut themselves down.
 - semaphore executor to decrement counters in this process so that
   queue state is updated
 This stops callers being able to submit work when the inner executor has
 shut down.

S3A code
- StoreImpl to IllegalStateException on  method invocation whene the service
  isn't running.
  Some methods are kept open as they do seem needed.
- WriteOperationHelper callbacks to raise IllegalStateException when invoked
  when FS is closed.

This is complex.

TODO:
- WriteOperationHelper MUST make all calls to the FS
  through its callback interface, rather than given a ref to S3AFS. This makes
  it easy to identify and lock down the methods.
- What is the correct exception to raise in write/close() failures?
  IOE or illegal state?
Took the opportunity to move all of WriteOperationHelper to calling store,
not FS, which then required all multipart IO to go there too.

Next stage of store design: multipart IO is its own service underneath
the store, so keeping the size of Store interface and Impl under control.
@steveloughran steveloughran force-pushed the s3/HADOOP-19569-stream-write-fs-close branch from cf32b54 to fea3e89 Compare May 27, 2025 11:17
Continue migration

* move nearly all MPU ops out of s3afs, resulting, so shaving code out of that
* Multipart service retrieved and invoked as appropriate
* StoreImpl stores a map of ServiceName -> service.
  with a lookupService() method in S3AStore interface, it's possible to
  retrieve services through the API just by knowing their name and type
* registering all current services this way
* fixing up the mock tests to work
* rate limiting api to hadoop-aws
* semaphored delegating excutor Itest to hadoop common unit test
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 56s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 17 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 6m 44s Maven dependency ordering for branch
+1 💚 mvninstall 36m 20s trunk passed
+1 💚 compile 17m 37s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 15m 12s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 4m 43s trunk passed
+1 💚 mvnsite 2m 32s trunk passed
+1 💚 javadoc 2m 9s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 41s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 3m 54s trunk passed
+1 💚 shadedclient 40m 39s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 41m 8s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
+1 💚 mvninstall 1m 27s the patch passed
+1 💚 compile 16m 52s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 16m 52s the patch passed
+1 💚 compile 15m 13s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 15m 13s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 39s /results-checkstyle-root.txt root: The patch generated 12 new + 13 unchanged - 4 fixed = 25 total (was 17)
+1 💚 mvnsite 2m 33s the patch passed
-1 ❌ javadoc 0m 52s /patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ javadoc 0m 48s /patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
+1 💚 spotbugs 4m 15s the patch passed
+1 💚 shadedclient 40m 8s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 15m 16s hadoop-common in the patch passed.
+1 💚 unit 4m 45s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 6s The patch does not generate ASF License warnings.
247m 44s
Subsystem Report/Notes
Docker ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7700/3/artifact/out/Dockerfile
GITHUB PR #7700
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 235fdff83467 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c347def
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7700/3/testReport/
Max. process+thread count 1253 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7700/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran requested a review from Copilot May 30, 2025 10:48
@apache apache deleted a comment from hadoop-yetus May 30, 2025
@apache apache deleted a comment from hadoop-yetus May 30, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses HADOOP-19569 by enhancing the S3A filesystem’s behavior when closed, ensuring that write/close operations fail gracefully, and improves internal service registration and executor shutdown behavior. Key changes include enforcing FS state checks via added checkRunning() calls, refactoring helper and callback methods (e.g. renaming getWriteOperationHelper to createWriteOperationHelperWithinActiveSpan and replacing direct FS calls with getStore() invocations), and cleaning up legacy utilities such as MultipartUtils.

Reviewed Changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/MockS3AFileSystem.java Adjusted WriteOperationHelper construction with updated callback parameters and method renaming.
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AMiscOperations.java
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AClientSideEncryption.java
Updated test calls to use getStore() for invoking putObjectDirect.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/S3AStoreImpl.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AStore.java
Inserted checkRunning() calls across API methods and updated service lookup/register methods.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/WriteOperationHelper.java Refactored to use a new “callbacks” field and improved null validations with requireNonNull.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java Exposed getStore() publicly and updated multipart upload handling to leverage the new MultipartIOService.
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestBlockingThreadPoolExecutorService.java
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/SemaphoredDelegatingExecutor.java
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/BlockingThreadPoolExecutorService.java
Enhanced executor service tests and added rejection-handling logic to ensure proper shutdown on task rejection.
Comments suppressed due to low confidence (3)

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/WriteOperationHelper.java:127

  • [nitpick] Update the Javadoc in WriteOperationHelper to reflect the change from using the variable name 'writeOperationHelperCallbacks' to 'callbacks', ensuring that the documentation clearly explains its role and usage.
private final WriteOperationHelperCallbacks callbacks;

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AStore.java:290

  • The completeMultipartUpload method has been removed from the S3AStore interface. Ensure that all external consumers have been updated accordingly or consider adding a deprecation warning to ease the transition.
/* Removed completeMultipartUpload method */

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:1357

  • [nitpick] Making the getStore() method public exposes internal implementation details. Confirm that this exposure is intended, or consider providing a more restricted accessor if external access is not required.
public S3AStore getStore() {

@@ -130,21 +130,20 @@ public static BlockingThreadPoolExecutorService newInstance(
slower than enqueueing. */
final BlockingQueue<Runnable> workQueue =
new LinkedBlockingQueue<>(waitingTasks + activeTasks);
final InnerExecutorRejection rejection = new InnerExecutorRejection();
Copy link
Preview

Copilot AI May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The InnerExecutorRejection handler now shuts down the service upon rejection. Consider enhancing the error handling logic or adding more detailed documentation to explain the shutdown behavior in case of task rejection.

Copilot uses AI. Check for mistakes.

private static final int BLOCKING_THRESHOLD_MSEC = 50;

private static final Integer SOME_VALUE = 1337;

private static BlockingThreadPoolExecutorService tpe;
private BlockingThreadPoolExecutorService tpe;
Copy link
Preview

Copilot AI May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Since the thread pool executor is now an instance variable with setup/teardown methods, ensure that each test properly initializes and destroys the executor to avoid interference between tests.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants