Enable Loss Fn in Graph PP #247

sanketpurandare · 2025-11-12T07:51:05Z

Stack from ghstack (oldest at bottom):

-> Enable Loss Fn in Graph PP #247

[ghstack-poisoned]

ghstack-source-id: b2f75c8 Pull Request resolved: #247

[ghstack-poisoned]

ghstack-source-id: 56b4b85 Pull Request resolved: #247

[ghstack-poisoned]

ghstack-source-id: ad07d83 Pull Request resolved: #247

wconstab · 2025-11-12T17:18:23Z

autoparallel/api.py


-def _export(model: torch.nn.Module, inputs: tuple[Any]) -> torch.nn.Module:
+def _export(
+    model: torch.nn.Module, model_wrapper: Callable, inputs: tuple[Any]


please add a comment to the docstring explaining what model_wrapper is for

Added a comprehensive doc string

wconstab · 2025-11-12T17:20:37Z

autoparallel/api.py

+        model_wrapper: Callable
+        if self.loss_fn is not None:
+
+            def model_with_loss(inputs, target) -> Any:


when we call 'model_wrapper' in export, we just pass it *inputs which apparently is expanding to inputs, target? that part is a little confusing for ux

Wrote some comments and have a consistent format now.

wconstab · 2025-11-12T17:23:29Z

autoparallel/graph_pp_runner.py

            self._accumulate_stage_grads_and_clear_states(stage)
+            if stage.is_last and has_targets_and_loss:
+                losses = kwargs["losses"]
+                losses.clear()


this is a bit confusing, i am expecting that we take the losses from kwargs and use them, why do we immediately clear them.. and replace with 'schedule internal losses' - what are those...

So in the last stage of PP, the loss is actually appended in an internally maintained list called as schedule._internal_losses. At the end of the pipeline step, the user provided list of losses is extended with schedule._internal_losses. Yeah, the losses.clear() is wrong and removed it, we shouldn't manage the user provided losses, we should just extend it.

wconstab · 2025-11-12T17:28:29Z

examples/example_ds3_pp.py

+
+    # Tracing input functions
+    tracing_input_fn = make_input_fn(spmd_batch_size, "tokens", device)
+    tracing_input_fn_after_first_stage = make_input_fn(


isn't the last stage output different shape than the embeddings for middle layers?

Yeah for that I have

shape_inference_output_fn_last_stage = ...

wconstab · 2025-11-12T17:28:53Z

examples/example_ds3_pp.py

-        )
+        return target_fn
+
+    # Tracing input functions


remind me, why do we have to make our own tracing functions? are we not using shape inference inside pipelining runtime? oh- autop needs this, i guess.

We need the tracing functions for AutoP.

We need the shape_inference functions for PP to run with fake_pg.

We need the runtime functions for generating inputs/targets for the actual run.

xmfan · 2025-11-12T21:57:34Z

autoparallel/api.py

                print_output=False, include_stride=True, include_device=True
            ),
        )
+        print(gm.graph)


:( use tlparse to get your graphs

Sorry!, this was for my own debugging

[ghstack-poisoned]

ghstack-source-id: 3c83666 Pull Request resolved: #247

wconstab · 2025-11-13T21:13:21Z

autoparallel/api.py


-def _export(model: torch.nn.Module, inputs: tuple[Any]) -> torch.nn.Module:
+def _export(
+    model: torch.nn.Module, model_wrapper: Callable, inputs: tuple[Any, ...]


nit: model_wrapper can be optional? if not provided, just use the model itself as the tracing entrypoint?

The model wrapper is never None or Optional. Even if we don't use loss function we do construct a model wrapper. I feel this would be a good api even for future use when we decide to use an optimizer.

[ghstack-poisoned]

ghstack-source-id: 53b1817 Pull Request resolved: #247

[ghstack-poisoned]

ghstack-source-id: 4bc2ada Pull Request resolved: #247

Enable Loss Fn in Graph PP

4075da0

[ghstack-poisoned]

sanketpurandare added a commit that referenced this pull request Nov 12, 2025

Enable Loss Fn in Graph PP

77c7720

ghstack-source-id: b2f75c8 Pull Request resolved: #247

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 12, 2025

Update on "Enable Loss Fn in Graph PP"

34c9150

[ghstack-poisoned]

sanketpurandare added a commit that referenced this pull request Nov 12, 2025

Enable Loss Fn in Graph PP

00ccbf8

ghstack-source-id: 56b4b85 Pull Request resolved: #247

sanketpurandare requested review from fmassa, wconstab and xmfan November 12, 2025 08:12

Update on "Enable Loss Fn in Graph PP"

5fcf73d

[ghstack-poisoned]

sanketpurandare added a commit that referenced this pull request Nov 12, 2025

Enable Loss Fn in Graph PP

d268245

ghstack-source-id: ad07d83 Pull Request resolved: #247

wconstab reviewed Nov 12, 2025

View reviewed changes

xmfan reviewed Nov 12, 2025

View reviewed changes

Update on "Enable Loss Fn in Graph PP"

4c93824

[ghstack-poisoned]

sanketpurandare added a commit that referenced this pull request Nov 13, 2025

Enable Loss Fn in Graph PP

d737d42

ghstack-source-id: 3c83666 Pull Request resolved: #247

sanketpurandare requested a review from wconstab November 13, 2025 21:07

wconstab approved these changes Nov 13, 2025

View reviewed changes

wconstab reviewed Nov 13, 2025

View reviewed changes

sanketpurandare mentioned this pull request Nov 13, 2025

Revert "[reland] Update export api to be the latest version." #249

Merged

sanketpurandare added 2 commits November 13, 2025 14:39

Update base for Update on "Enable Loss Fn in Graph PP"

71c326a

[ghstack-poisoned]

Update on "Enable Loss Fn in Graph PP"

700b3b9

[ghstack-poisoned]

sanketpurandare added a commit that referenced this pull request Nov 13, 2025

Enable Loss Fn in Graph PP

9b5b75b

ghstack-source-id: 53b1817 Pull Request resolved: #247

sanketpurandare added 2 commits November 13, 2025 17:38

Update base for Update on "Enable Loss Fn in Graph PP"

d823847

[ghstack-poisoned]

Update on "Enable Loss Fn in Graph PP"

f554c86

[ghstack-poisoned]

sanketpurandare added a commit that referenced this pull request Nov 14, 2025

Enable Loss Fn in Graph PP

f13a078

ghstack-source-id: 4bc2ada Pull Request resolved: #247

sanketpurandare changed the base branch from gh/sanketpurandare/1/base to main November 14, 2025 01:47

sanketpurandare merged commit a8d46ea into main Nov 14, 2025
4 of 6 checks passed

sanketpurandare mentioned this pull request Nov 14, 2025

Enabling ZeroBubbleV schedule in Graph PP #250

Open

Enable Loss Fn in Graph PP #247

Enable Loss Fn in Graph PP #247

Uh oh!

Conversation

sanketpurandare commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanketpurandare Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanketpurandare Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sanketpurandare commented Nov 12, 2025 •

edited

Loading

sanketpurandare Nov 13, 2025 •

edited

Loading

sanketpurandare Nov 13, 2025 •

edited

Loading