Don't allow EOS until 4 frames have been generated #14761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

rfejgin merged 13 commits into NVIDIA-NeMo:magpietts_2508 from rfejgin:magpietts_2508_forbid_eos_near_start

Sep 27, 2025

Collaborator

rfejgin commented Sep 19, 2025 •

edited

Loading

Problem: We have observed a very rare but persistent issue where Magpie can generate a zero-length output (for a standard length input text). Here's what was happening: the EOS logit at the very first timestep would become very large for particular utterances after application of CFG. Interestingly, before CFG neither the conditional nor unconditional EOS logit was particularly large; but CFG amplified the difference between them which resultsed in the post-CFG EOS logit being the maximum among all logits.

Fix: In this PR we avoid this particular early termination issue by disallowing EOS in the first 4 timesteps, corresponding to about 186ms (with a 21.5 Hz codec). That helps the model avoid first-frame termination and it then actually generates the rest of the utterance correctly. The value of 4 steps was chosen because the codec requires a minimum of 4 frames to decode (albeit at a batch level). This way we sample those 4 steps rather than potentially force-replace them with zero-index tokens.

Note that I also examined the logits for an instance of mid-sentence termination and the logits did not follow the pattern of EOS becoming large only after CFG in that case. So CFG doesn't appear to be at the root of all early-termination issues, just the start-of-utterance one.

(Still, it's probably worthwhile thinking about how to improve the underlying CFG mechanism; but for now we need to a safeguard so that these zero-length generations don't happen.)

I ran evaluations and got similar metrics with and without this constraint, which should kick in very rarely.


          Don't allow EOS until 4 frames have been generated

1782d67

The number of frames is configuratble via a parameter to infer_batch().

This is a workaround to the observation that when CFG is on
we sometimes terminate after zero tokens. It appears to be an artifacts
of CFG, since the EOS logit is not particularly large for the conditional
logits; only post-CFG.

Signed-off-by: Fejgin, Roy <[email protected]>

github-actions bot added the TTS label

rfejgin added 2 commits

September 19, 2025 11:11


          Formatting

a0e58d0

Signed-off-by: Fejgin, Roy <[email protected]>


          Command line option to set minimum number of frames to generate

57df90c

Signed-off-by: Fejgin, Roy <[email protected]>

rfejgin requested review from paarthneekhara and blisc

September 24, 2025 01:24

rfejgin added the Run CICD label

rfejgin marked this pull request as ready for review

September 24, 2025 01:25

chtruong814 added Run CICD and removed Run CICD labels


          formatting

1005d1b

Signed-off-by: Fejgin, Roy <[email protected]>

rfejgin force-pushed the magpietts_2508_forbid_eos_near_start branch from 925661f to 1005d1b Compare

September 24, 2025 03:11

chtruong814 added Run CICD and removed Run CICD labels


          Merge remote-tracking branch 'nemo/magpietts_2508' into magpietts_250…

4da4dbf

…8_forbid_eos_near_start

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels

chtruong814 temporarily deployed to test

September 24, 2025 03:19

— with

GitHub Actions Inactive


          Show extreme values in violin plots

b99620a

(to aid in debugging rare issues)

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels

chtruong814 temporarily deployed to test

September 24, 2025 17:06

— with

GitHub Actions Inactive

paarthneekhara approved these changes

View reviewed changes

Collaborator

paarthneekhara left a comment

Looks good to me.


          Fix merge issues

378a852

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels

chtruong814 had a problem deploying to test

September 24, 2025 18:26

— with

GitHub Actions Error


          More merge fixes

279d7bd

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels

chtruong814 temporarily deployed to test

September 24, 2025 18:28

— with

GitHub Actions Inactive


          Remove temporary changes in infer_and_evaluate.py

f9a68ac

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels

chtruong814 had a problem deploying to test

September 25, 2025 00:57

— with

GitHub Actions Error


          Comments

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels

chtruong814 had a problem deploying to test

September 25, 2025 01:11

— with

GitHub Actions Error


          Comments

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels


          Fix typo

699fb57

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels

rfejgin enabled auto-merge (squash)

September 25, 2025 01:23

chtruong814 temporarily deployed to test

September 25, 2025 01:23

— with

GitHub Actions Inactive

rfejgin disabled auto-merge

September 25, 2025 03:49

rfejgin enabled auto-merge (squash)

September 26, 2025 20:45


          Merge branch 'magpietts_2508' into magpietts_2508_forbid_eos_near_start

427a53c

chtruong814 added Run CICD and removed Run CICD labels

chtruong814 had a problem deploying to test

September 27, 2025 00:49

— with

GitHub Actions Error

rfejgin added Run CICD and removed Run CICD labels

rfejgin had a problem deploying to test

September 27, 2025 01:15

— with

GitHub Actions Error

rfejgin added Run CICD and removed Run CICD labels

rfejgin temporarily deployed to test

September 27, 2025 01:20

— with

GitHub Actions Inactive

rfejgin merged commit f3878d7 into NVIDIA-NeMo:magpietts_2508

95 of 103 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels