Skip to content

Pull requests: aws-samples/awsome-distributed-training

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Adding llamav3 FSDP sample-slurm
#728 opened Jun 10, 2025 by allela-roy Loading…
Updating Parallelcluster deployment guide enhancement New feature or request
#721 opened Jun 9, 2025 by KeitaW Loading…
HyperPod EKS Helper Script Fixes
#709 opened Jun 5, 2025 by bluecrayon52 Loading…
new fsdp dataset: nvidia-deeplearningexamples
#706 opened May 30, 2025 by mvinci12 Loading…
adding nemo2.0 eks test case
#688 opened May 21, 2025 by KeitaW Draft
Lustre mount via Ansible for SMHP Slurm LCS
#682 opened May 15, 2025 by amanshanbhag Loading…
Enable 1click for SageMaker HyperPod
#670 opened May 8, 2025 by mhuguesaws Loading…
Feat/picotron resume from checkpoint enhancement New feature or request
#656 opened Apr 29, 2025 by KeitaW Loading…
Feat/ddp mlflow enhancement New feature or request
#655 opened Apr 28, 2025 by KeitaW Loading…
Feature/slinkly slurm hyperpod eks enhancement New feature or request
#651 opened Apr 25, 2025 by bluecrayon52 Draft
refactoring megatron-lm test case enhancement New feature or request
#637 opened Apr 9, 2025 by KeitaW Draft
add tips to force NCCL comm to go through EFA
#531 opened Jan 23, 2025 by KeitaW Loading…
easy smhp slurm and eks
#514 opened Dec 10, 2024 by gmgtamz Loading…
add GPU accounting for SMHP
#462 opened Oct 21, 2024 by KeitaW Loading…
Update bionemo test case + propose to subdirectories per orchastrator documentation Improvements or additions to documentation
#396 opened Aug 5, 2024 by KeitaW Draft
Esm2 on Sagemaker Hyperpod
#387 opened Jul 25, 2024 by awsankur Loading…
update dependencies of PyTorch base image
#375 opened Jul 15, 2024 by KeitaW Loading…
Neuron distributed
#359 opened Jun 13, 2024 by KeitaW Loading…
End-to-End LLM Model Development with Torchtitan and Torchtune enhancement New feature or request
#341 opened May 20, 2024 by KeitaW Loading…
Llama training with FP8
#331 opened May 15, 2024 by pbelevich Draft
ProTip! Type g i on any issue or pull request to go back to the issue listing page.