Skip to content

Conversation

tjkemp
Copy link

@tjkemp tjkemp commented Jul 31, 2025

Summary

Batch inference was using the mask from first item (encoder_attention_mask[0]) in hunyan_video_usp_example.py, which works for bs=1 but cuts off masks for longers prompts and produces artifacts. This PR provides a fix.

How to reproduce

  1. Apply patch to make script save all videos
diff --git a/examples/hunyuan_video_usp_example.py b/examples/hunyuan_video_usp_example.py
index 03856f1..a9041ac 100644
--- a/examples/hunyuan_video_usp_example.py
+++ b/examples/hunyuan_video_usp_example.py
@@ -297,7 +291,7 @@ def main():
         guidance_scale=input_config.guidance_scale,
         generator=torch.Generator(device="cuda").manual_seed(
             input_config.seed),
-    ).frames[0]
+    )
 
     end_time = time.time()
     elapsed_time = end_time - start_time
@@ -311,9 +305,10 @@ def main():
     )
     if is_dp_last_group():
         resolution = f"{input_config.width}x{input_config.height}"
-        output_filename = f"results/hunyuan_video_{parallel_info}_{resolution}.mp4"
-        export_to_video(output, output_filename, fps=15)
-        print(f"output saved to {output_filename}")
+        for idx, frames in enumerate(output.frames, start=1):
+            output_filename = f"results/hunyuan_video_{idx:02d}_{parallel_info}_{resolution}.mp4"
+            export_to_video(frames, output_filename, fps=15)
+            print(f"output saved to {output_filename}")
 
     if get_world_group().rank == get_world_group().world_size - 1:
         print(
  1. Run an example with an added second prompt.
mkdir -p results && torchrun --nproc_per_node=2 examples/hunyuan_video_usp_example.py --model tencent/HunyuanVideo --ulysses_degree 2 --num_inference_steps 30 --warmup_steps 0 --prompt "A husky puppy plays with its own tail." "Two Siamese cats eat sushi from a plate." --height 320 --width 512 --num_frames 61 --enable_tiling --enable_model_cpu_offload
  1. Compare results

The second video before this fix:
hunyuan_video_02

The second video after applying this fix:
hunyuan_video_02

Notes

Unrelated to this change, I’m seeing:

[rank1]:   File "/app/xDiT/examples/hunyuan_video_usp_example.py", line 297, in main
[rank1]:     guidance_scale=input_config.guidance_scale,
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AttributeError: 'InputConfig' object has no attribute 'guidance_scale'

@tjkemp tjkemp changed the title Fix mask handling for batch generation Fix mask handling for batch generation in HunyuanVideo example Jul 31, 2025
@feifeibear
Copy link
Collaborator

guidance_scale error probably comes from capability of diffusers version

@tjkemp
Copy link
Author

tjkemp commented Sep 2, 2025

Is there anything else I could adjust to help get this PR ready for merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants