Skip to content

Data import methods free memory #763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pepijnve opened this issue May 22, 2025 · 4 comments
Closed

Data import methods free memory #763

pepijnve opened this issue May 22, 2025 · 4 comments
Labels
Type: usage usage question

Comments

@pepijnve
Copy link
Contributor

pepijnve commented May 22, 2025

The various import methods in the Data class are documented to transfer ownership of the Arrow data from the input object to internal values as per the C FFI. If I'm reading https://arrow.apache.org/docs/format/CDataInterface.html#moving-an-array correctly, implementations are supposed to hollow out the input object in that case by nulling the release field. In practice a class like ArrayImporter does this

void importArray(ArrowArray src) {
  ...
  // Move imported array
  ArrowArray ownedArray = ArrowArray.allocateNew(allocator);
  ownedArray.save(snapshot);
  src.markReleased();
  src.close();
  ...
}

So not only is the array marked as released, it is also closed so the ArrowArray instance is no longer usable. This may or may not happen when you call Data#importIntoVector for instance, since that might abort early when given a null Allocator which makes the post conditions of Data#importIntoVector a bit unclear. You kind of expect the ArrowArray to have been closed, but you still need to call close yourself as well or risk having a memory leak.

What's the rationale behind the choice to also call close on objects passed to Data rather than expecting people to use try-with-resources/try-finally themselves?

@pepijnve pepijnve added the Type: usage usage question label May 22, 2025
@lidavidm
Copy link
Member

Basically, the idea is that the importer should not free the struct ArrowArray itself, so that the caller code can use its own try-with-resources? I think that makes sense. (Well, either that, or the importer internally should free the struct even if it errors, which I think is less clean but would be less of a breaking change.)

@pepijnve
Copy link
Contributor Author

Basically, the idea is that the importer should not free the struct ArrowArray itself, so that the caller code can use its own try-with-resources?

I think that would make more sense personally, but it would be a serious backwards incompatible change. The alternative would be to add additional methods (perhaps an overloaded method signature with a boolean parameter, although that's not really pretty either) that let the caller control what happens.

I would be happy to put together a PR for this if you like. Any preference on what this should look like exactly?

@lidavidm
Copy link
Member

New methods is probably cleanest

@pepijnve
Copy link
Contributor Author

pepijnve commented May 23, 2025

I took a first stab at this in #766. The PR crossed your comment, so the initial version is the breaking change approach. I'll close this discussion issue and continue in the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: usage usage question
Projects
None yet
Development

No branches or pull requests

2 participants