-
Notifications
You must be signed in to change notification settings - Fork 136
XCOMMONS-3250: Make as easy as possible for various features to use an object store #1416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
* A Blob is a piece of data stored in a BlobStore. | ||
* | ||
* @version $Id$ | ||
* @since 17.7.0RC1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's not going to be the case, but anyway, hard to be sure what to put yet.
* @throws BlobStoreException if the InputStream cannot be read or the blob cannot be written, for example because | ||
* its name is invalid. | ||
*/ | ||
void writeFromStream(InputStream inputStream, WriteCondition... conditions) throws BlobStoreException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You think there are cases when getOutputStream
would not be enough or that passing the inputstream to write would allow some optimizations ? Otherwise, I don't see much point of exposing this API, even as a helper IOUtils#copy
should be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure either if we need it. S3 has special support for efficiently uploading large input streams in the async API (that the current implementation isn't using) so we could avoid that output stream with manual chunking. So from that point of view, it would actually be better to get rid of getOutputStream
.
I forgot adding a comment, but I already thought that we should get rid of one of the two methods, depending on which one we actually use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, getOutputStream
is more generic, and I think we need it for some use cases where it might not be easy to provide an InputStream
for what we need to write. But if you have a case where passing an InputStream
is better for performance in some implementation, then +1 to keep both (and probably recommend using #writeFromStream, when possible, in the javadoc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I've added a comment to reconsider the method.
...commons-store-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/Blob.java
Outdated
Show resolved
Hide resolved
* @since 17.7.0RC1 | ||
*/ | ||
@Unstable | ||
public interface BlobStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A #moveBlob(BlobPath source, BlobPath target)
could be interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, though for S3 this will be just copy + delete from what I can see. For the filesystem backend it probably makes sense to have a better implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for S3 this will be just copy + delete from what I can see
I figured it was why it was not there, but even for S3 maybe there is something we can do to make this operation as atomic as possible. Anyway, it's good to have it since it's a basic operation in many stores.
...ons-store-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/BlobPath.java
Outdated
Show resolved
Hide resolved
...ons-store-blob/xwiki-commons-store-blob-api/src/main/java/org/xwiki/store/blob/BlobPath.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
/** | ||
* Create a BlobPath by splitting a slash-delimited string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this path is to be used as a generic syntax, we probably need to support any path (for example, the current parsing/serializing code does not seem to have any way to escape /
). But I guess you have that in mind already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not probably not clear enough, but the idea wasn't that this API would be used with user-supplied paths. The idea was really that this is an internal API that is used with already "safe" paths and just replaces using raw filesystem access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, right now it's not internal at all (it's a public API). I also think that it could be interesting to have a generic syntax to allow us to serialize the BlobPath
in case we need to refer to it somewhere else (the database, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant internal in the sense that it shouldn't be exposed to users/user-controlled strings in the same way as regular file access.
In a very first version, I used Path
instead of the BlobPath
class, but that was terrible from a readability point of view as you had no idea if a Path
was an actual native filesystem path or a virtual path that is rooted at the base directory of the store. The idea of BlobPath
is to offer something in between a simple string and a full Path
, something that is easier to use than just concatenating strings to, e.g., append something to the path, or get the parent "directory". I would also want to try actually using it a bit before we finalize that design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what the current code is missing is a validation that segments of the BlobPath
don't contain /
or \
. We should also forbid .
and ..
as segments and maybe some other sanity checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added very basic validation for segments for now. I'll revisit this when trying to use the API to see how useful the methods are in actual use.
...-blob-filesystem/src/main/java/org/xwiki/store/blob/internal/FileSystemBlobStoreManager.java
Show resolved
Hide resolved
...ore-blob/xwiki-commons-store-blob-s3/src/main/java/org/xwiki/store/blob/internal/S3Blob.java
Outdated
Show resolved
Hide resolved
...mmons-store-blob-api/src/main/java/org/xwiki/store/blob/internal/BlobStoreConfiguration.java
Outdated
Show resolved
Hide resolved
*/ | ||
@Role | ||
@Unstable | ||
public interface BlobStoreManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we would need a way to dispose/close a BlobStore
, when we are sure it won't be needed anymore (for example when the component which needs it has been unloaded), in case it holds some resources (memory, Cache to dispose, some open connection, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, though at the moment it's not really needed. The only important thing regarding disposing/closing right now is to close all input streams that are returned from an S3Blob
in a timely manner as each of them will hold an open HTTP connection and thus blocks one connection of the connection pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I've added very generic dispose support in DefaultBlobStoreManager
to dispose all blob stores that implement Disposable
when the blob store manager is disposed.
dc53080
to
4213088
Compare
4213088
to
b0f75a5
Compare
b0f75a5
to
eaf275a
Compare
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
…n object store * First draft of module structure, API and filesystem implementation.
…n object store * Extended the blog store API a bit. * Added an S3 implementation, the initial draft was by Claude Sonnet 4. Untested.
…n object store * Fix since versions and component metadata. * Remove duplicate instance caching from S3 blob store manager.
…n object store * Add TODOs to the S3BlobStore implementation.
…n object store * Update module versions after rebase on master.
…n object store * Introduce AbstractBlob to reduce code duplication between Blob implementations. * Remove redundant copy/delete methods in Blob. * Add a comment to reconsider writeFromStream. * Add path segment validation to BlobPath. * Rename the raw path to canonical and directly call "hashCode()" on it. * Move the S3 blob store configuration to its own component. * Add the possibility that a blob store can be disposed.
…n object store * Allow empty BlobPath objects and make getParent easier to use. * Add a dedicated method to append a suffix to the last segment of a BlobPath. * Add support for moving blob in and between stores. * Fix copying blobs in S3 between different stores. * Implement moving blobs with the file system and S3. * Update the S3 library version.
…n object store * Add directory-related APIs to BlobStore. * Simplify the stream implementation of the S3BlobStore. * Add filter-stream output/input for blobs. * Let getStream() only throw BlobStoreException for consistency * Rebase version to 17.9.0-SNAPSHOT
…n object store * Add static ROOT path
…n object store * Add support for deleting blobs with a prefix.
…n object store * Fix S3BlobStore#isEmptyDirectory returning the opposite result.
…n object store * Add support for migrating between stores.
…n object store * Add some tests for the blob store API. * Add extensive tests for the filesystem blob store. * Add equals and hashCode to all blob stores. * Some fixes and a lot of bulletproofing.
eaf275a
to
f14dcbe
Compare
Jira URL
https://jira.xwiki.org/browse/XCOMMONS-3250
Changes
Description
Clarifications
Screenshots & Video
No UI changes.
Executed Tests
None so far.
Expected merging strategy