Skip to content

make fsspec.asyn._get_batch_size() public #1327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pmrowla opened this issue Aug 4, 2023 · 5 comments
Open

make fsspec.asyn._get_batch_size() public #1327

pmrowla opened this issue Aug 4, 2023 · 5 comments

Comments

@pmrowla
Copy link

pmrowla commented Aug 4, 2023

Being able to get the configured or system default batch size from fsspec is useful when using filesystems that support an additional level of concurrency (beyond the existing fs methods that support file batching like get()/put()).

See: adlfs, which supports setting concurrency for chunked/multipart uploads and downloads (and in this case concurrency is handled by sending a batch size to the underlying Azure SDK, scheduling the individual chunk upload/download in adlfs with _run_coros_in_chunks() is not an option).

_get_batch_size() is currently marked as protected (with the leading _) and internally only gets called in asyn._run_coros_in_chunks(), but it would be better if fsspec exposed a public method for getting the configured batch size

@efiop
Copy link
Member

efiop commented Aug 4, 2023

I suspect that there was simply no use case for this before to make it public. Probably can just contribute a PR.

@martindurant
Copy link
Member

Agree with @efiop

I will comment that fsspec is an unusual library, because it has a definite "end user" API of public functions/methods (as in the docs) and a whole load of other stuff that is used by other libraries or implementations that build on fsspec. In addition, the async methods of an async implementation also start with "_" to show they are special, but not hidden/private. Perhaps a better convention could have been chosen.

@pmrowla
Copy link
Author

pmrowla commented Aug 4, 2023

In addition, the async methods of an async implementation also start with "_" to show they are special, but not hidden/private. Perhaps a better convention could have been chosen.

I'm aware of this convention for the sync vs async AbstractFileSystem implementation methods (and it's documented that it works this way).

But fsspec.asyn also contains a mix of module level methods that look intentionally public (like fsspec.asyn.get_loop()) and ones that don't (like fsspec.asyn._get_batch_size()) which is why the concerns about using _get_batch_size() were raised in the adlfs PR

@pmrowla
Copy link
Author

pmrowla commented Aug 8, 2023

If it's safe to assume _get_batch_size() is relatively stable and is safe for fs implementations to use, this issue can probably just be closed.

@martindurant
Copy link
Member

The best way to make sure that it is safe is to write a test against it and add docstrings/comments to the code itself. I would say it is not planned for any changes, in its existence or the signature, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants