XCOMMONS-3250: Make as easy as possible for various features to use an object store #1416

michitux · 2025-08-05T16:25:10Z

Jira URL

https://jira.xwiki.org/browse/XCOMMONS-3250

Changes

Description

Introduce a blob storage API with implementations for file system and Amazon S3.

Clarifications

This is an initial draft, the S3 version is probably missing some essential configuration options for third-party providers.
The code is currently untested and also missing unit tests.
Before this is merged/released, I think we should migrate at least one use case in XWiki to this new blob storage to see if the API actually works.
This PR exists to provide visibility into the ongoing work and to get feedback on the API.

Screenshots & Video

No UI changes.

Executed Tests

None so far.

Expected merging strategy

Prefers squash: Yes
Backport on branches:
- None

tmortagne · 2025-08-07T11:35:05Z

...commons-store-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/Blob.java

+ * A Blob is a piece of data stored in a BlobStore.
+ *
+ * @version $Id$
+ * @since 17.7.0RC1


I guess it's not going to be the case, but anyway, hard to be sure what to put yet.

tmortagne · 2025-08-07T13:46:33Z

...commons-store-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/Blob.java

+     * @throws BlobStoreException if the InputStream cannot be read or the blob cannot be written, for example because
+     * its name is invalid.
+     */
+    void writeFromStream(InputStream inputStream, WriteCondition... conditions) throws BlobStoreException;


You think there are cases when getOutputStream would not be enough or that passing the inputstream to write would allow some optimizations ? Otherwise, I don't see much point of exposing this API, even as a helper IOUtils#copy should be good enough.

I'm not sure either if we need it. S3 has special support for efficiently uploading large input streams in the async API (that the current implementation isn't using) so we could avoid that output stream with manual chunking. So from that point of view, it would actually be better to get rid of getOutputStream.

I forgot adding a comment, but I already thought that we should get rid of one of the two methods, depending on which one we actually use.

Well, getOutputStream is more generic, and I think we need it for some use cases where it might not be easy to provide an InputStream for what we need to write. But if you have a case where passing an InputStream is better for performance in some implementation, then +1 to keep both (and probably recommend using #writeFromStream, when possible, in the javadoc).

For now, I've added a comment to reconsider the method.

...commons-store-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/Blob.java

tmortagne · 2025-08-07T13:51:52Z

...ns-store-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/BlobStore.java

+ * @since 17.7.0RC1
+ */
+@Unstable
+public interface BlobStore


A #moveBlob(BlobPath source, BlobPath target) could be interesting.

Indeed, though for S3 this will be just copy + delete from what I can see. For the filesystem backend it probably makes sense to have a better implementation.

for S3 this will be just copy + delete from what I can see

I figured it was why it was not there, but even for S3 maybe there is something we can do to make this operation as atomic as possible. Anyway, it's good to have it since it's a basic operation in many stores.

...ons-store-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/BlobPath.java

tmortagne · 2025-08-07T14:05:13Z

...ons-store-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/BlobPath.java

+    }
+
+    /**
+     * Create a BlobPath by splitting a slash-delimited string.


If this path is to be used as a generic syntax, we probably need to support any path (for example, the current parsing/serializing code does not seem to have any way to escape /). But I guess you have that in mind already.

It's not probably not clear enough, but the idea wasn't that this API would be used with user-supplied paths. The idea was really that this is an internal API that is used with already "safe" paths and just replaces using raw filesystem access.

Well, right now it's not internal at all (it's a public API). I also think that it could be interesting to have a generic syntax to allow us to serialize the BlobPath in case we need to refer to it somewhere else (the database, etc).

I meant internal in the sense that it shouldn't be exposed to users/user-controlled strings in the same way as regular file access.

In a very first version, I used Path instead of the BlobPath class, but that was terrible from a readability point of view as you had no idea if a Path was an actual native filesystem path or a virtual path that is rooted at the base directory of the store. The idea of BlobPath is to offer something in between a simple string and a full Path, something that is easier to use than just concatenating strings to, e.g., append something to the path, or get the parent "directory". I would also want to try actually using it a bit before we finalize that design.

I think what the current code is missing is a validation that segments of the BlobPath don't contain / or \. We should also forbid . and .. as segments and maybe some other sanity checks.

I've added very basic validation for segments for now. I'll revisit this when trying to use the API to see how useful the methods are in actual use.

...-blob-filesystem/src/main/java/org/xwiki/store/blob/internal/FileSystemBlobStoreManager.java

...ore-blob/xwiki-commons-store-blob-s3/src/main/java/org/xwiki/store/blob/internal/S3Blob.java

...mmons-store-blob-api/src/main/java/org/xwiki/store/blob/internal/BlobStoreConfiguration.java

tmortagne · 2025-08-07T14:24:48Z

...e-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/BlobStoreManager.java

+ */
+@Role
+@Unstable
+public interface BlobStoreManager


I'm wondering if we would need a way to dispose/close a BlobStore, when we are sure it won't be needed anymore (for example when the component which needs it has been unloaded), in case it holds some resources (memory, Cache to dispose, some open connection, etc.).

Good point, though at the moment it's not really needed. The only important thing regarding disposing/closing right now is to close all input streams that are returned from an S3Blob in a timely manner as each of them will hold an open HTTP connection and thus blocks one connection of the connection pool.

For now, I've added very generic dispose support in DefaultBlobStoreManager to dispose all blob stores that implement Disposable when the blob store manager is disposed.

socket-security · 2025-10-08T09:52:28Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	software.amazon.awssdk/s3@2.33.1
	software.amazon.awssdk/apache-client@2.33.1

View full report

…n object store * First draft of module structure, API and filesystem implementation.

…n object store * Extended the blog store API a bit. * Added an S3 implementation, the initial draft was by Claude Sonnet 4. Untested.

…n object store * Fix since versions and component metadata. * Remove duplicate instance caching from S3 blob store manager.

…n object store * Add TODOs to the S3BlobStore implementation.

…n object store * Update module versions after rebase on master.

…n object store * Introduce AbstractBlob to reduce code duplication between Blob implementations. * Remove redundant copy/delete methods in Blob. * Add a comment to reconsider writeFromStream. * Add path segment validation to BlobPath. * Rename the raw path to canonical and directly call "hashCode()" on it. * Move the S3 blob store configuration to its own component. * Add the possibility that a blob store can be disposed.

…n object store * Allow empty BlobPath objects and make getParent easier to use. * Add a dedicated method to append a suffix to the last segment of a BlobPath. * Add support for moving blob in and between stores. * Fix copying blobs in S3 between different stores. * Implement moving blobs with the file system and S3. * Update the S3 library version.

…n object store * Add directory-related APIs to BlobStore. * Simplify the stream implementation of the S3BlobStore. * Add filter-stream output/input for blobs. * Let getStream() only throw BlobStoreException for consistency * Rebase version to 17.9.0-SNAPSHOT

…n object store * Add static ROOT path

…n object store * Add support for deleting blobs with a prefix.

…n object store * Fix S3BlobStore#isEmptyDirectory returning the opposite result.

…n object store * Add support for migrating between stores.

…n object store * Add some tests for the blob store API. * Add extensive tests for the filesystem blob store. * Add equals and hashCode to all blob stores. * Some fixes and a lot of bulletproofing.

michitux self-assigned this Aug 6, 2025

michitux force-pushed the XCOMMONS-3250 branch from e267097 to 962c18c Compare August 7, 2025 10:07