DOC: Clarify DeviceStatsMonitor logged metrics #20895

MrAnayDongre · 2025-06-11T23:08:26Z

What does this PR do?

This PR addresses issue #20807 by adding detailed documentation for the metrics logged by DeviceStatsMonitor.

The key clarifications include:

The source of metrics (CPU via psutil, CUDA GPU via torch.cuda.memory_stats, and other accelerators via accelerator.get_device_stats()).
The naming convention for logged keys: DeviceStatsMonitor.{hook_name}/{base_metric_name}.
Explicitly states that GPU compute utilization is not logged by default for CUDA devices, with a pointer to torch.cuda.memory_stats() for the full list of memory metrics.
Provides examples for common CPU, CUDA, and other accelerator (TPU, MPS) metrics.
Includes a minor update to profiler_basic.rst to align with these clarifications and link to the API docs.

This documentation aims to help users understand what statistics to expect when using DeviceStatsMonitor with different hardware configurations.

Fixes #20807

No breaking changes are introduced by this documentation update.

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20895.org.readthedocs.build/en/20895/

Borda · 2025-06-12T08:04:44Z

src/lightning/pytorch/callbacks/device_stats_monitor.py

@@ -45,6 +45,23 @@ class DeviceStatsMonitor(Callback):
        ModuleNotFoundError:
            If ``psutil`` is not installed and CPU stats are monitored.

+    Logged Metrics:


Raises: or Args: are Sphinx-specific keywords compared to Logged Metrics:, so pls let's move it just to the top of this docstring

Thanks, @Borda! I've moved the 'Logged Metrics' section to the top of the docstring as requested in commit dcd1042.

well, I still see it without change

MrAnayDongre requested review from lantiga, Borda, tchaton, justusschock and ethanwharris as code owners June 11, 2025 23:08

github-actions bot added the pl Generic label for PyTorch Lightning package label Jun 11, 2025

Borda changed the title ~~DOC: Clarify DeviceStatsMonitor logged metrics (#20807)~~ DOC: Clarify DeviceStatsMonitor logged metrics Jun 12, 2025

Borda reviewed Jun 12, 2025

View reviewed changes

DOC: Clarify DeviceStatsMonitor logged metrics (Lightning-AI#20807)

dcd1042

MrAnayDongre force-pushed the docs/fix-20807-device-stats-metrics branch from 6f17537 to dcd1042 Compare June 12, 2025 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC: Clarify DeviceStatsMonitor logged metrics #20895

DOC: Clarify DeviceStatsMonitor logged metrics #20895

Uh oh!

MrAnayDongre commented Jun 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

Borda Jun 12, 2025

Uh oh!

MrAnayDongre Jun 12, 2025

Uh oh!

Borda Jun 13, 2025

Uh oh!

Uh oh!

DOC: Clarify DeviceStatsMonitor logged metrics #20895

Are you sure you want to change the base?

DOC: Clarify DeviceStatsMonitor logged metrics #20895

Uh oh!

Conversation

MrAnayDongre commented Jun 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Uh oh!

Borda Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

MrAnayDongre Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Borda Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MrAnayDongre commented Jun 11, 2025 •

edited by github-actions bot

Loading