Skip to content

Commit e924de2

Browse files
committed
refactor(docs): rename rollout_is_migration to rollout_is
- Update all references to use rollout_is naming consistently
1 parent 30247fc commit e924de2

File tree

6 files changed

+41
-103
lines changed

6 files changed

+41
-103
lines changed

docs/advance/rollout_is_migration.md renamed to docs/advance/rollout_is.md

Lines changed: 28 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Rollout Importance Sampling - Migration Guide
1+
# Rollout Importance Sampling
22

33
Last updated: 10/11/2025.
44

5-
This document provides a comprehensive overview of the Rollout Importance Sampling (IS) implementation merged from aiic_verl into verl.
5+
This document provides a comprehensive overview of the Rollout Importance Sampling (IS) implementation in verl.
66

77
## References
88

@@ -17,26 +17,10 @@ Rollout Importance Sampling corrects for distribution mismatch between:
1717

1818
This mismatch can lead to biased gradient estimates and unstable training. Rollout IS applies importance sampling weights to correct these biases.
1919

20-
## What Changed
21-
22-
### **Removed (Old Implementation)**
23-
24-
```yaml
25-
# Old TIS configuration (REMOVED)
26-
actor:
27-
tis_imp_ratio_cap: 2.0 # ❌ No longer supported
28-
```
29-
30-
The old implementation:
31-
- Only supported token-level truncate mode
32-
- Had no metrics tracking
33-
- Lacked numerical stability safeguards
34-
- No configurability for different scenarios
35-
36-
### **Added (New Implementation)**
20+
## Configuration
3721

3822
```yaml
39-
# New Rollout IS configuration (all in algorithm config)
23+
# Rollout IS configuration (all in algorithm config)
4024
algorithm:
4125
# Main control: set threshold to enable (null = disabled)
4226
rollout_is_threshold: 2.0
@@ -53,7 +37,7 @@ actor_rollout_ref:
5337
calculate_log_probs: true
5438
```
5539
56-
The new implementation:
40+
Key features:
5741
- ✅ Three aggregation levels: token, sequence, geometric
5842
- ✅ Two bounding modes: truncate, mask
5943
- ✅ Dual threshold support (upper/lower)
@@ -62,64 +46,35 @@ The new implementation:
6246
- ✅ Log-space computation for numerical stability
6347
- ✅ Memory-efficient implementation
6448
65-
## Files Modified
49+
## Files
6650
6751
### **Core Implementation**
6852
69-
1. **NEW**: `verl/trainer/ppo/mismatch_helper.py`
70-
- Contains `compute_rollout_importance_weights()` - main function
71-
- Contains `compute_is_metrics()` - comprehensive metrics
72-
73-
2. **MODIFIED**: `verl/trainer/ppo/core_algos.py` (lines 962-991)
74-
- Replaced old TIS implementation (lines 962-967)
75-
- Added new rollout IS with metrics support
76-
77-
3. **MODIFIED**: `verl/workers/actor/dp_actor.py`
78-
- Updated to use `rollout_is_threshold` instead of `tis_imp_ratio_cap`
79-
- Collects and logs all rollout IS metrics
53+
- `verl/trainer/ppo/mismatch_helper.py` - Contains `compute_rollout_importance_weights()` and `compute_is_metrics()`
54+
- `verl/trainer/ppo/core_algos.py` - Rollout IS integration with PPO
55+
- `verl/workers/actor/dp_actor.py` - Metrics collection and logging
8056

8157
### **Configuration Files**
8258

83-
4. **MODIFIED**: `verl/trainer/config/algorithm.py` (lines 95-100)
84-
- Added 6 new rollout IS parameters to `AlgoConfig`
85-
86-
5. **MODIFIED**: `verl/workers/config/actor.py` (lines 110-115)
87-
- Added 6 new rollout IS parameters to `ActorConfig`
88-
89-
6. **MODIFIED**: `verl/trainer/config/actor/actor.yaml` (lines 77-89)
90-
- Added rollout IS configuration section
91-
92-
7. **MODIFIED**: `verl/trainer/config/ppo_trainer.yaml` (lines 116-133)
93-
- Added rollout IS to algorithm config
59+
- `verl/trainer/config/algorithm.py` - Rollout IS parameters in `AlgoConfig`
60+
- `verl/workers/config/actor.py` - Rollout IS parameters in `ActorConfig`
61+
- `verl/trainer/config/actor/actor.yaml` - Rollout IS configuration section
62+
- `verl/trainer/config/ppo_trainer.yaml` - Algorithm config with rollout IS
9463

9564
### **Documentation**
9665

97-
8. **MODIFIED**: `docs/examples/config.rst`
98-
- Updated actor config with rollout IS parameters
99-
- Updated algorithm config with rollout IS parameters
100-
- Added detailed parameter descriptions
66+
- `docs/examples/config.rst` - Configuration parameter descriptions
10167

10268
### **Example Scripts**
10369

104-
9. **MODIFIED**: `recipe/dapo/run_dapo_qwen2.5_32b_tis.sh`
105-
- Updated from `tis_imp_ratio_cap` to rollout IS parameters
106-
- Added comprehensive comments
107-
108-
10. **NEW**: `examples/rollout_importance_sampling/README.md`
109-
- Comprehensive guide with usage patterns
110-
- Troubleshooting section
111-
- Performance considerations
112-
113-
11. **NEW**: `examples/rollout_importance_sampling/run_with_rollout_is.sh`
114-
- Basic example with token-level truncate
70+
- `recipe/dapo/run_dapo_qwen2.5_32b_rollout_is.sh` - DAPO example with rollout IS
71+
- `examples/rollout_importance_sampling/README.md` - Comprehensive usage guide
72+
- `examples/rollout_importance_sampling/run_with_rollout_is.sh` - Basic example
11573

11674
### **Tests**
11775

118-
12. **NEW**: `tests/trainer/ppo/test_rollout_is.py`
119-
- Unit tests for rollout IS functionality
120-
121-
13. **NEW**: `tests/trainer/ppo/test_rollout_is_integration.py`
122-
- Integration tests with PPO
76+
- `tests/trainer/ppo/test_rollout_is.py` - Unit tests
77+
- `tests/trainer/ppo/test_rollout_is_integration.py` - Integration tests
12378

12479
## Configuration Parameters
12580

@@ -156,20 +111,10 @@ Bounding mode:
156111
Per-token veto threshold. If any token ratio < this, entire sequence is rejected.
157112
Default: `1e-4` (ratio 10,000x off)
158113

159-
## Migration Steps
114+
## Usage
160115

161-
### Step 1: Update Your Configuration
116+
### Basic Setup
162117

163-
**Before (Old):**
164-
```yaml
165-
actor_rollout_ref:
166-
actor:
167-
tis_imp_ratio_cap: 2.0
168-
rollout:
169-
calculate_log_probs: true
170-
```
171-
172-
**After (New):**
173118
```yaml
174119
algorithm:
175120
rollout_is_threshold: 2.0 # Main control
@@ -179,10 +124,10 @@ algorithm:
179124
180125
actor_rollout_ref:
181126
rollout:
182-
calculate_log_probs: true # Still required!
127+
calculate_log_probs: true # Required!
183128
```
184129

185-
### Step 2: Monitor New Metrics
130+
### Metrics
186131

187132
All metrics are prefixed with `mismatch/`. For example, `rollout_is_mean` appears as `mismatch/rollout_is_mean` in logs.
188133

@@ -416,7 +361,7 @@ if not is_healthy:
416361
print(" - Checking if rollout and training policies are too different")
417362
```
418363

419-
### Step 3: Test Your Training
364+
### Running Examples
420365

421366
Start with the basic token-level truncate configuration:
422367
```bash
@@ -582,11 +527,6 @@ for step in range(num_steps):
582527
- **Computational overhead**: 1-3% depending on level
583528
- **Training stability**: Significantly improved when mismatch exists
584529

585-
## Backward Compatibility
586-
587-
**The old `tis_imp_ratio_cap` parameter is completely removed.** There is no backward compatibility mode.
588-
589-
All scripts and configurations must be updated to use the new rollout IS parameters.
590530

591531
## Testing
592532

@@ -606,15 +546,13 @@ Expected output: All tests pass ✓
606546

607547
- **Implementation**: `verl/trainer/ppo/mismatch_helper.py`
608548
- **Examples**: `examples/rollout_importance_sampling/`
609-
- **DAPO Example**: `recipe/dapo/run_dapo_qwen2.5_32b_tis.sh`
549+
- **DAPO Example**: `recipe/dapo/run_dapo_qwen2.5_32b_rollout_is.sh`
610550

611551
## Summary
612552

613-
The new Rollout Importance Sampling implementation provides:
614-
- More robust handling of distribution mismatch
615-
- Better numerical stability
553+
Rollout Importance Sampling provides:
554+
- Robust handling of distribution mismatch
555+
- Numerical stability
616556
- ✅ Comprehensive metrics for monitoring
617557
- ✅ Flexibility for different scenarios
618558
- ✅ Memory-efficient computation
619-
620-
Migration is straightforward: replace `tis_imp_ratio_cap` with the new `rollout_is_*` parameters in the `algorithm` config section.

docs/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ verl is fast with:
121121
examples/sandbox_fusion_example
122122
advance/rollout_trace.rst
123123
advance/rollout_skip.rst
124-
advance/rollout_is_migration.md
124+
advance/rollout_is.md
125125
advance/one_step_off
126126
advance/agent_loop
127127
advance/fully_async

recipe/dapo/run_dapo_qwen2.5_32b_tis.sh renamed to recipe/dapo/run_dapo_qwen2.5_32b_rollout_is.sh

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@ kl_coef=0.0
1616
use_kl_loss=False
1717
kl_loss_coef=0.0
1818

19-
# Rollout Importance Sampling parameters (matches original TIS with threshold=2)
19+
# Rollout Importance Sampling parameters
2020
rollout_is=True
2121
rollout_is_threshold=2.0
22-
rollout_is_threshold_lower=null # No lower bound (original TIS behavior)
23-
rollout_is_level=token # token-level (original TIS behavior)
24-
rollout_is_mode=truncate # truncate mode (original TIS behavior)
25-
rollout_is_veto_threshold=null # No veto (original TIS behavior)
22+
rollout_is_threshold_lower=null # No lower bound
23+
rollout_is_level=token # token-level
24+
rollout_is_mode=truncate # truncate mode
25+
rollout_is_veto_threshold=null # No veto
2626

2727
clip_ratio_low=0.2
2828
clip_ratio_high=0.28

tests/trainer/ppo/test_rollout_is.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ def test_basic_rollout_is():
4747
rollout_log_prob = old_log_prob + torch.randn(batch_size, seq_length, device=device) * 0.1
4848
eos_mask = torch.ones(batch_size, seq_length, device=device)
4949

50-
# Test token-level truncate mode (equivalent to old TIS)
50+
# Test token-level truncate mode
5151
print("\n1. Testing token-level truncate mode...")
5252
weights_proto, metrics = compute_rollout_importance_weights(
5353
old_log_prob=old_log_prob,

verl/trainer/config/algorithm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ class AlgoConfig(BaseConfig):
9393
use_pf_ppo: bool = False
9494
pf_ppo: dict[str, Any] = field(default_factory=dict)
9595
filter_groups: Optional[FilterGroupsConfig] = None
96-
# Rollout Importance Sampling (replaces legacy tis_imp_ratio_cap)
96+
# Rollout Importance Sampling
9797
# Controls computation of IS weights and mismatch metrics
9898
rollout_is_threshold: Optional[float] = None # null = disabled, float = enabled
9999
rollout_is_threshold_lower: Optional[float] = None

verl/trainer/ppo/mismatch_helper.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
2121
Key Features:
2222
1. Three aggregation levels: token, sequence, geometric
23-
2. Two handling modes: truncate (TIS), mask (MIS)
23+
2. Two handling modes: truncate, mask
2424
3. Per-token veto mechanism for catastrophic outliers
2525
4. Memory-efficient computation to prevent CUDA OOM
2626
5. Comprehensive metrics tracking
@@ -76,8 +76,8 @@ def compute_rollout_importance_weights(
7676
- "sequence": Product of ratios (unbiased)
7777
- "geometric": Geometric mean of ratios (experimental)
7878
rollout_is_mode: How to handle weights exceeding threshold:
79-
- "truncate": Cap weights at upper_threshold only (TIS)
80-
- "mask": Zero out weights outside [lower_threshold, upper_threshold] (MIS)
79+
- "truncate": Cap weights at upper_threshold only
80+
- "mask": Zero out weights outside [lower_threshold, upper_threshold]
8181
rollout_is_threshold: Upper threshold for IS weights
8282
rollout_is_threshold_lower: Lower threshold for IS weights (mask mode only; if None, defaults to 1/upper)
8383
rollout_is_veto_threshold: Per-token veto threshold. If any token ratio < this, zero entire sequence.
@@ -181,11 +181,11 @@ def compute_rollout_importance_weights(
181181

182182
# Step 3: Apply truncation or masking based on mode
183183
if rollout_is_mode == "truncate":
184-
# Truncated IS (TIS): only cap upper bound to prevent overweighting
184+
# Truncate mode: only cap upper bound to prevent overweighting
185185
rollout_is_weights = rollout_is_weights.clamp(max=upper_threshold)
186186

187187
elif rollout_is_mode == "mask":
188-
# Masked IS (MIS): zero out weights outside [lower_threshold, upper_threshold]
188+
# Mask mode: zero out weights outside [lower_threshold, upper_threshold]
189189
mask = (rollout_is_weights >= lower_threshold) & (rollout_is_weights <= upper_threshold)
190190
mask = mask.float()
191191

0 commit comments

Comments
 (0)