Skip to content

Conversation

@yushangdi
Copy link

@yushangdi yushangdi commented Nov 4, 2025

TORCH_ENRICH_RPOFILER_STACK_TRACE=1  NGPU=8 CONFIG_FILE=./torchtitan/models/llama3/train_configs/debug_model.toml ./run_train.sh --model.name compiler_toolkit.llama3 --parallelism.data_parallel_shard_degree=2 --parallelism.tensor_parallel_degree=4 --model.flavor=debugmodel_flex_attn 

Requires pytorch/pytorch#167171 and pytorch/pytorch#167114

Screenshot 2025-11-04 at 10 17 03 AM

You can check the augmented trace in manifold/explorer/pytorch/tree/shangdiy/rank0_trace_augmented.json

cc @SherlockNoMad

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 4, 2025
@yushangdi yushangdi marked this pull request as draft November 4, 2025 18:17
@yushangdi yushangdi changed the title Example augmenting GPU profiler trace with model stack traces [do not land] Example augmenting GPU profiler trace with model stack traces Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants