1- # Rollout Importance Sampling - Migration Guide
1+ # Rollout Importance Sampling
22
33Last updated: 10/11/2025.
44
5- This document provides a comprehensive overview of the Rollout Importance Sampling (IS) implementation merged from aiic_verl into verl.
5+ This document provides a comprehensive overview of the Rollout Importance Sampling (IS) implementation in verl.
66
77## References
88
@@ -17,26 +17,10 @@ Rollout Importance Sampling corrects for distribution mismatch between:
1717
1818This mismatch can lead to biased gradient estimates and unstable training. Rollout IS applies importance sampling weights to correct these biases.
1919
20- ## What Changed
21-
22- ### ** Removed (Old Implementation)**
23-
24- ``` yaml
25- # Old TIS configuration (REMOVED)
26- actor :
27- tis_imp_ratio_cap : 2.0 # ❌ No longer supported
28- ` ` `
29-
30- The old implementation:
31- - Only supported token-level truncate mode
32- - Had no metrics tracking
33- - Lacked numerical stability safeguards
34- - No configurability for different scenarios
35-
36- ### **Added (New Implementation)**
20+ ## Configuration
3721
3822``` yaml
39- # New Rollout IS configuration (all in algorithm config)
23+ # Rollout IS configuration (all in algorithm config)
4024algorithm :
4125 # Main control: set threshold to enable (null = disabled)
4226 rollout_is_threshold : 2.0
@@ -53,7 +37,7 @@ actor_rollout_ref:
5337 calculate_log_probs : true
5438` ` `
5539
56- The new implementation :
40+ Key features :
5741- ✅ Three aggregation levels: token, sequence, geometric
5842- ✅ Two bounding modes: truncate, mask
5943- ✅ Dual threshold support (upper/lower)
@@ -62,64 +46,35 @@ The new implementation:
6246- ✅ Log-space computation for numerical stability
6347- ✅ Memory-efficient implementation
6448
65- ## Files Modified
49+ ## Files
6650
6751### **Core Implementation**
6852
69- 1. **NEW**: ` verl/trainer/ppo/mismatch_helper.py`
70- - Contains `compute_rollout_importance_weights()` - main function
71- - Contains `compute_is_metrics()` - comprehensive metrics
72-
73- 2. **MODIFIED** : ` verl/trainer/ppo/core_algos.py` (lines 962-991)
74- - Replaced old TIS implementation (lines 962-967)
75- - Added new rollout IS with metrics support
76-
77- 3. **MODIFIED** : ` verl/workers/actor/dp_actor.py`
78- - Updated to use `rollout_is_threshold` instead of `tis_imp_ratio_cap`
79- - Collects and logs all rollout IS metrics
53+ - ` verl/trainer/ppo/mismatch_helper.py` - Contains `compute_rollout_importance_weights()` and `compute_is_metrics()`
54+ - ` verl/trainer/ppo/core_algos.py` - Rollout IS integration with PPO
55+ - ` verl/workers/actor/dp_actor.py` - Metrics collection and logging
8056
8157# ## **Configuration Files**
8258
83- 4. **MODIFIED** : ` verl/trainer/config/algorithm.py` (lines 95-100)
84- - Added 6 new rollout IS parameters to `AlgoConfig`
85-
86- 5. **MODIFIED** : ` verl/workers/config/actor.py` (lines 110-115)
87- - Added 6 new rollout IS parameters to `ActorConfig`
88-
89- 6. **MODIFIED** : ` verl/trainer/config/actor/actor.yaml` (lines 77-89)
90- - Added rollout IS configuration section
91-
92- 7. **MODIFIED** : ` verl/trainer/config/ppo_trainer.yaml` (lines 116-133)
93- - Added rollout IS to algorithm config
59+ - ` verl/trainer/config/algorithm.py` - Rollout IS parameters in `AlgoConfig`
60+ - ` verl/workers/config/actor.py` - Rollout IS parameters in `ActorConfig`
61+ - ` verl/trainer/config/actor/actor.yaml` - Rollout IS configuration section
62+ - ` verl/trainer/config/ppo_trainer.yaml` - Algorithm config with rollout IS
9463
9564# ## **Documentation**
9665
97- 8. **MODIFIED** : ` docs/examples/config.rst`
98- - Updated actor config with rollout IS parameters
99- - Updated algorithm config with rollout IS parameters
100- - Added detailed parameter descriptions
66+ - ` docs/examples/config.rst` - Configuration parameter descriptions
10167
10268# ## **Example Scripts**
10369
104- 9. **MODIFIED** : ` recipe/dapo/run_dapo_qwen2.5_32b_tis.sh`
105- - Updated from `tis_imp_ratio_cap` to rollout IS parameters
106- - Added comprehensive comments
107-
108- 10. **NEW** : ` examples/rollout_importance_sampling/README.md`
109- - Comprehensive guide with usage patterns
110- - Troubleshooting section
111- - Performance considerations
112-
113- 11. **NEW** : ` examples/rollout_importance_sampling/run_with_rollout_is.sh`
114- - Basic example with token-level truncate
70+ - ` recipe/dapo/run_dapo_qwen2.5_32b_rollout_is.sh` - DAPO example with rollout IS
71+ - ` examples/rollout_importance_sampling/README.md` - Comprehensive usage guide
72+ - ` examples/rollout_importance_sampling/run_with_rollout_is.sh` - Basic example
11573
11674# ## **Tests**
11775
118- 12. **NEW** : ` tests/trainer/ppo/test_rollout_is.py`
119- - Unit tests for rollout IS functionality
120-
121- 13. **NEW** : ` tests/trainer/ppo/test_rollout_is_integration.py`
122- - Integration tests with PPO
76+ - ` tests/trainer/ppo/test_rollout_is.py` - Unit tests
77+ - ` tests/trainer/ppo/test_rollout_is_integration.py` - Integration tests
12378
12479# # Configuration Parameters
12580
@@ -156,20 +111,10 @@ Bounding mode:
156111Per-token veto threshold. If any token ratio < this, entire sequence is rejected.
157112Default : ` 1e-4` (ratio 10,000x off)
158113
159- # # Migration Steps
114+ # # Usage
160115
161- # ## Step 1: Update Your Configuration
116+ # ## Basic Setup
162117
163- **Before (Old):**
164- ` ` ` yaml
165- actor_rollout_ref:
166- actor:
167- tis_imp_ratio_cap: 2.0
168- rollout:
169- calculate_log_probs: true
170- ` ` `
171-
172- **After (New):**
173118` ` ` yaml
174119algorithm:
175120 rollout_is_threshold: 2.0 # Main control
@@ -179,10 +124,10 @@ algorithm:
179124
180125actor_rollout_ref:
181126 rollout:
182- calculate_log_probs: true # Still required !
127+ calculate_log_probs: true # Required !
183128` ` `
184129
185- # ## Step 2: Monitor New Metrics
130+ # ## Metrics
186131
187132All metrics are prefixed with `mismatch/`. For example, `rollout_is_mean` appears as `mismatch/rollout_is_mean` in logs.
188133
@@ -416,7 +361,7 @@ if not is_healthy:
416361 print(" - Checking if rollout and training policies are too different")
417362` ` `
418363
419- # ## Step 3: Test Your Training
364+ # ## Running Examples
420365
421366Start with the basic token-level truncate configuration :
422367` ` ` bash
@@ -582,11 +527,6 @@ for step in range(num_steps):
582527- **Computational overhead**: 1-3% depending on level
583528- **Training stability**: Significantly improved when mismatch exists
584529
585- # # Backward Compatibility
586-
587- **The old `tis_imp_ratio_cap` parameter is completely removed.** There is no backward compatibility mode.
588-
589- All scripts and configurations must be updated to use the new rollout IS parameters.
590530
591531# # Testing
592532
@@ -606,15 +546,13 @@ Expected output: All tests pass ✓
606546
607547- **Implementation**: `verl/trainer/ppo/mismatch_helper.py`
608548- **Examples**: `examples/rollout_importance_sampling/`
609- - **DAPO Example**: `recipe/dapo/run_dapo_qwen2.5_32b_tis .sh`
549+ - **DAPO Example**: `recipe/dapo/run_dapo_qwen2.5_32b_rollout_is .sh`
610550
611551# # Summary
612552
613- The new Rollout Importance Sampling implementation provides :
614- - ✅ More robust handling of distribution mismatch
615- - ✅ Better numerical stability
553+ Rollout Importance Sampling provides :
554+ - ✅ Robust handling of distribution mismatch
555+ - ✅ Numerical stability
616556- ✅ Comprehensive metrics for monitoring
617557- ✅ Flexibility for different scenarios
618558- ✅ Memory-efficient computation
619-
620- Migration is straightforward : replace `tis_imp_ratio_cap` with the new `rollout_is_*` parameters in the `algorithm` config section.
0 commit comments