Resolve DDP error by disabling gradients for lm_head #351

www-Ye · 2025-09-05T13:42:38Z

Fixes #164

Description

This PR addresses a RuntimeError that occurs during multi-GPU fine-tuning using DistributedDataParallel (DDP).

Problem:

When running the training script on multiple GPUs, the following error is raised:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss.
...
Parameter indices which did not receive grad for rank 7: 134

This happens because DDP expects all parameters with requires_grad=True to participate in the backward pass. However, during our fine-tuning process, the language_model.lm_head is not used in the loss calculation, and thus its parameters never receive gradients. DDP waits for these gradients indefinitely, causing the synchronization to fail and throw the error.

Solution:

The fix is to explicitly disable gradient calculations for the lm_head by setting self.eagle_model.language_model.lm_head.requires_grad_(False). This informs DDP that it should not expect gradients for these parameters, resolving the synchronization issue.

This approach is more efficient than setting find_unused_parameters=True in the DDP wrapper, as it avoids the overhead of searching for unused parameters in each iteration.

Changes proposed in this pull request:

In eagle_backbone.py, set requires_grad=False for the language_model.lm_head to prevent DDP errors during multi-GPU training.

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

Update eagle_backbone.py

fba2f66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolve DDP error by disabling gradients for lm_head #351

Resolve DDP error by disabling gradients for lm_head #351

www-Ye commented Sep 5, 2025

Uh oh!

Uh oh!

Resolve DDP error by disabling gradients for lm_head #351

Are you sure you want to change the base?

Resolve DDP error by disabling gradients for lm_head #351

Conversation

www-Ye commented Sep 5, 2025

Description

Before submitting

Uh oh!

Uh oh!