A framework for stress-testing reinforcement learning algorithms under dynamic environment conditions. This project evaluates how various RL algorithms handle sudden changes in environment parameters mid-training, simulating real-world scenarios where operating conditions shift unexpectedly.
This repository implements and compares six reinforcement learning algorithms:
- PPO (Proximal Policy Optimization)
- TRPO (Trust Region Policy Optimization)
- Actor-Critic
- DQN (Deep Q-Network)
- Option-Critic
- Random Agent (baseline)
The stress test methodology involves:
- Training agents in stable conditions for 500 episodes
- Introducing environment parameter changes at episode 500
- Evaluating agent adaptation for 500 additional episodes
- Comparing performance across algorithms
Test Environments:
- CartPole-v1: Tests balance control with modified physics parameters
- Pacman (Atari): Tests game-playing with altered difficulty modes
The project uses uv to manage and lock project dependencies for a consistent and reproducible environment. If you do not have uv installed on your system, visit this page for installation instructions.
Note: If you have pip you can just invoke:
pip install uv# Clone the repo
git clone [email protected]:sebastian9991/Stress-Test-on-RL-Algorithms.git
# Enter the repo directory
cd Stress-Test-on-RL-Algorithms
# Install core dependencies into an isolated environment
uv syncRun complete experiments across all algorithms and environments:
./run_experiements.shResults are saved in results/ directory with performance plots generated automatically.
├── algorithms/ # RL algorithm implementations
│ ├── ppo.py
│ ├── trpo.py
│ ├── actor_critic.py
│ ├── dqn.py
│ ├── option_critic.py
│ └── random_agent.py
├── mutable_ale/ # Mutable environment wrappers
│ ├── mutable_cartpole.py
│ └── mutable_ALE.py
├── policies/ # Policy implementations
├── scripts/ # Plotting and analysis utilities
├── main.py # Main experiment runner
└── run_experiements.sh # Convenience script
Experiments generate:
- Reward curves over episodes
- Moving average performance
- Hyperparameter comparison plots
- Per-algorithm performance metrics
Results are saved as JSON files and PNG plots in the results/ directory.
The following table shows average rewards per episode for the final 100 episodes across different gravity levels (stress test transitions from 9.8 m/s² to 24.5 m/s² at episode 500):
The reward plot below demonstrates the impact of the stress test, showing a clear decrease in rewards immediately after increasing gravity from 9.8 m/s² to 24.5 m/s² at episode 500:
Note: The full experimental report is available in Testing_Generalizatibility_Stress_Test.pdf.



