Add advantage filtering support for PPO #405

daphne-cornelisse · 2025-04-05T13:55:17Z

Description

Implements advantage filtering from "Robust Autonomy Emerges from Self-Play" (Details in Appendix C, Algorithm 1).

The key idea is to discard transitions with low-magnitude advantages to focus training on the most informative samples.

Added config options:

apply_advantage_filter
beta (0.25)
initial_th_factor (0.01)

Todo

Implement Advantage Filtering with existing Experience buffer, preserving the shape of tensors. Zero-out all transitions < threshold.
Restructure Experience buffer to actually filter out such transitions, for memory and training efficiency

Logging

Add short message:

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│  🐡 PufferLib 1.0.0                                           CPU: 3.2%               GPU: 98.0%               DRAM: 0.3%                      VRAM: 0.0%  │
│                                                                                                                                                            │
│  Summary                                    Value    Evaluate                            42s       84%    Losses                                    Value  │
│  Environment                             gpudrive      Forward                            0s        1%    policy_loss                              -0.002  │
│  Agent Steps                               393.7k      Env                               41s       82%    value_loss                                3.634  │
│  SPS                                         8.0k      Misc                               0s        0%    entropy                                   4.487  │
│  Epoch                                          3    Train                                4s        9%    old_approx_kl                             0.004  │
│  Uptime                                       50s      Forward                            2s        5%    approx_kl                                 0.004  │
│  Remaining                             34h 50m 7s      Learn                              4s        8%    clipfrac                                  0.020  │
│                                                        Misc                               0s        0%    explained_variance                        0.297  │
│                                                                                                                                                            │
│  User Stats                                                           Value    User Stats                                                           Value  │
│                                                                                                                                                            │
│  Message: Advantage filtering: kept 87.6% of transitions                                                                                                   │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

…tch_size

…b size dynamic

eugenevinitsky · 2025-04-13T17:12:34Z

gpudrive/integrations/puffer/ppo.py

+            transitions_to_use = transitions_per_mb * self.num_minibatches
+
+            np.random.shuffle(kept_indices)
+            kept_indices = kept_indices[:transitions_to_use]


What is this line supposed to do?

It ensures that the number of transitions is divisible across the # of mini batches

Implement advantage filtering - zero-out transitions below threshold

8fd7903

daphne-cornelisse requested a review from eugenevinitsky April 5, 2025 13:57

daphne-cornelisse added 4 commits April 5, 2025 10:46

Store config with checkpoint

773c6f3

Logging stats

d2f67a5

Take in the number of minibatches instead of enforcing a fixed miniba…

c2c457b

…tch_size

Filter out transitions associated with low advantages by making the m…

5a93bca

…b size dynamic

eugenevinitsky reviewed Apr 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add advantage filtering support for PPO #405

Add advantage filtering support for PPO #405

Uh oh!

daphne-cornelisse commented Apr 5, 2025 •

edited

Loading

Uh oh!

eugenevinitsky Apr 13, 2025

Uh oh!

daphne-cornelisse Apr 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add advantage filtering support for PPO #405

Are you sure you want to change the base?

Add advantage filtering support for PPO #405

Uh oh!

Conversation

daphne-cornelisse commented Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Todo

Logging

Uh oh!

eugenevinitsky Apr 13, 2025

Choose a reason for hiding this comment

Uh oh!

daphne-cornelisse Apr 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

daphne-cornelisse commented Apr 5, 2025 •

edited

Loading