Skip to content

Fabric: Enable auto for devices and accelerator cli` arguments #20913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

fnhirwa
Copy link
Contributor

@fnhirwa fnhirwa commented Jun 17, 2025

What does this PR do?

Fixes #20451

This PR adds support for the auto variable for accelerator and devices in the Fabric CLI.

  • When auto is passed to --accelerator we now detect whether the machine has either mps, cuda, or cpu and dynamically runs on the available device, this is the same case when no accelerator is specified.

  • For auto in the --devices argument the devices default to 1 as specified in the docstring of the argument here

No breaking changes were introduced.

Before submitting
  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20913.org.readthedocs.build/en/20913/

@github-actions github-actions bot added the fabric lightning.fabric.Fabric label Jun 17, 2025
@fnhirwa fnhirwa marked this pull request as draft June 17, 2025 10:12
Copy link

codecov bot commented Jun 17, 2025

Codecov Report

Attention: Patch coverage is 81.25000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 87%. Comparing base (cb1afbe) to head (e77cedf).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #20913   +/-   ##
=======================================
- Coverage      87%      87%   -0%     
=======================================
  Files         268      268           
  Lines       23442    23456   +14     
=======================================
+ Hits        20394    20402    +8     
- Misses       3048     3054    +6     

@fnhirwa fnhirwa marked this pull request as ready for review June 17, 2025 10:59
@fnhirwa fnhirwa marked this pull request as draft June 17, 2025 15:23
@@ -187,6 +187,19 @@ def _set_env_variables(args: Namespace) -> None:

def _get_num_processes(accelerator: str, devices: str) -> int:
"""Parse the `devices` argument to determine how many processes need to be launched on the current machine."""
if accelerator == "auto" or accelerator is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not duplicate the logic and use already written functions like:

@fnhirwa fnhirwa marked this pull request as ready for review June 19, 2025 10:33
_SUPPORTED_ACCELERATORS = ("cpu", "gpu", "cuda", "mps", "tpu", "auto")


def _choose_auto_accelerator() -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so why do we have now the same function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to import from


But there were some dependencies errors, one for torchmetrics on some runners.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fabric lightning.fabric.Fabric
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error: Invalid value for '--accelerator': 'auto' is not one of 'cpu', 'gpu', 'cuda', 'mps', 'tpu'.
2 participants