Skip to content

GH-765: Do not close/free imported BaseStruct objects #766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2025

Conversation

pepijnve
Copy link
Contributor

@pepijnve pepijnve commented May 23, 2025

What's Changed

This PR removes the direct and indirect calls to BaseStruct#close from org.apache.arrow.c.Data. By not eagerly closing/freeing these objects callers can reuse instances multiple times.

This contains breaking changes.: a quick check in, for instance, org.apache.arrow.c.StreamTest shows that a lot of existing code is already written using try-with-resources. This change will not have any impact there. Code that does not use a try-with-resources or try-finally-close pattern and instead counts on Data to call close will report memory leaks after this change.
The second version of this PR removes the breaking change.

Closes #765.

This comment has been minimized.

@pepijnve
Copy link
Contributor Author

#763 (comment) suggests adding new or overloaded methods that do not call close instead of introducing a breaking change. Is there any preference wrt naming these?

@lidavidm lidavidm added the enhancement PRs that add or improve features. label May 23, 2025
@github-actions github-actions bot added this to the 18.4.0 milestone May 23, 2025
@lidavidm
Copy link
Member

Perhaps by analogy with the base library, load and unload, instead of import and export? That won't tell you which method is newer, but then we can add Deprecated annotations

@pepijnve
Copy link
Contributor Author

Just FYI, I'm experimenting a bit, and neither the load variant nor the boolean flag variant are particularly appealing.

load feels a bit asymmetric. There's no need for any changes in the export methods as far as I can tell, so then you have three sets of methods: export, import, and load.

The overload of import with a boolean flag is an option, but a bit ugly for importVectorSchemaRoot(BufferAllocator, ArrowArray, ArrowSchema, CDataDictionaryProvider) where the behavior is controlled for both the array and the schema. Having two boolean flags seems even worser though.

@lidavidm
Copy link
Member

Hmm.

A single boolean flag seems fine for importVectorSchemaRoot, IMO.

Alternatively: add a constructor boolean parameter for ArrayImporter, then create a new NonOwningData or something?

@pepijnve
Copy link
Contributor Author

I've pushed a second attempt that introduces overloaded import methods with a boolean argument. I've moved all the close calls to the Data class for clarity. ArrayImporter and ArrowArrayStreamReader are both package visible classes, so the change there is not a breaking change.

Comment on lines 566 to 570
ArrowArrayStreamReader reader = new ArrowArrayStreamReader(allocator, stream);
if (closeImportedStructs) {
stream.close();
}
return reader;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) the indentation seems off?
(2) should we put this in a try-with-resources to at least fix the edge case noted (failure to free in case of exception)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation fixed has been corrected.
Regarding the stream#close call, I went for maintaining the exact current behavior. The ArrowArrayStreamReader constructor did not call close in a finally block. Let me know if you would prefer to change this.

@lidavidm
Copy link
Member

Looks reasonable to me, thanks.

@lidavidm
Copy link
Member

checkstyle is still unhappy:

Warning:  src/main/java/org/apache/arrow/c/Data.java:[361] (javadoc) SummaryJavadoc: First sentence of Javadoc is missing an ending period.
Warning:  src/main/java/org/apache/arrow/c/Data.java:[543] (javadoc) AtclauseOrder: Javadoc comment at column 36 has parse error. Details: mismatched input '#' expecting MEMBER while parsing REFERENCE
Warning:  src/main/java/org/apache/arrow/c/Data.java:[543] (javadoc) JavadocTagContinuationIndentation: Javadoc comment at column 36 has parse error. Details: mismatched input '#' expecting MEMBER while parsing REFERENCE
Warning:  src/main/java/org/apache/arrow/c/Data.java:[543] (javadoc) NonEmptyAtclauseDescription: Javadoc comment at column 36 has parse error. Details: mismatched input '#' expecting MEMBER while parsing REFERENCE
Warning:  src/main/java/org/apache/arrow/c/Data.java:[543] (javadoc) SummaryJavadoc: Javadoc comment at column 36 has parse error. Details: mismatched input '#' expecting MEMBER while parsing REFERENCE

@lidavidm
Copy link
Member

Additionally, could we add a small unit test if possible? Otherwise LGTM

@pepijnve
Copy link
Contributor Author

Checkstyle issues should be fixed. I'll add some unit tests next.

@pepijnve pepijnve force-pushed the issue_765 branch 2 times, most recently from 18b3aff to 6c093db Compare May 27, 2025 11:49
@pepijnve
Copy link
Contributor Author

pepijnve commented May 27, 2025

I've added a couple of tests. mvn verify says 👍 locally.

@lidavidm lidavidm merged commit c6f608e into apache:main May 28, 2025
25 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement PRs that add or improve features.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid closing/freeing struct-like object instances on import
2 participants