This repository contains the official implementation for the algorithm Evolutionary Policy Optimization (EPO).
Clone the repository and create a Conda environment using the env.yaml
file.
conda env create -f env.yaml
conda activate epo
Download the Isaac Gym Preview 4 release from the website and executing the following after unzipping the downloaded file
cd isaacgym/python
pip install -e .
Now, in the root folder of the repository, execute the following commands,
cd rl_games
pip install -e .
cd ..
pip install -e .
We provide the exact commands which can be used to reproduce the performance of policies trained with EPO, SAPG, and PPO on different environments
# Allegro Kuka Regrasping
./scripts/train_allegro_kuka.sh regrasping "test" 1 24576 [] --epo --lstm --num-expl-coef-blocks=64 --wandb-entity <ENTITY_NAME> --ir-type=none # EPO
./scripts/train_allegro_kuka.sh regrasping "test" 1 24576 [] --ppo --lstm --wandb-entity <ENTITY_NAME> # PPO
# Allegro Kuka Throw
./scripts/train_allegro_kuka.sh throw "test" 1 24576 [] --epo --lstm --num-expl-coef-blocks=64 --wandb-entity <ENTITY_NAME> --ir-type=none #EPO
./scripts/train_allegro_kuka.sh throw "test" 1 24576 [] --ppo --lstm --wandb-entity <ENTITY_NAME> # PPO
# Allegro Kuka Reorientation
./scripts/train_allegro_kuka.sh reorientation "test" 1 24576 [] --epo --lstm --num-expl-coef-blocks=64 --wandb-entity <ENTITY_NAME> --ir-type=entropy --ir-coef-scale=0.005 #EPO
./scripts/train_allegro_kuka.sh reorientation "test" 1 24576 [] --ppo --lstm --wandb-entity <ENTITY_NAME> # PPO
# Allegro Kuka Two Arms Regrasping
./scripts/train_allegro_kuka_two_arms.sh regrasping "test" 1 24576 [] --epo --lstm --num-expl-coef-blocks=64 --wandb-entity <ENTITY_NAME> --ir-type=entropy --ir-coef-scale=0.002 # EPO
./scripts/train_allegro_kuka_two_arms.sh regrasping "test" 6 4104 [] --ppo --lstm --wandb-entity <ENTITY_NAME> # PPO
# Allegro Kuka Two Arms Reorientation
./scripts/train_allegro_kuka_two_arms.sh reorientation "test" 1 24576 [] --epo --lstm --num-expl-coef-blocks=64 --wandb-entity <ENTITY_NAME> --ir-type=entropy --ir-coef-scale=0.002 # EPO
./scripts/train_allegro_kuka_two_arms.sh reorientation "test" 6 4104 [] --ppo --lstm --wandb-entity <ENTITY_NAME> # PPO
If you find our code useful, please cite our work
@misc{wang2025evolutionarypolicyoptimization,
title={Evolutionary Policy Optimization},
author={Jianren Wang and Yifan Su and Abhinav Gupta and Deepak Pathak},
year={2025},
eprint={2503.19037},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.19037},
}
This implementation builds upon the the following codebases -
[1] Petrenko, A., Allshire, A., State, G., Handa, A., & Makoviychuk, V. (2023). DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training. ArXiv, abs/2305.12127.
[2] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. ArXiv, abs/1707.06347.
[3] Singla, Jayesh and Agarwal, Ananye and Pathak, Deepak. (2024). SAPG: Split and Aggregate Policy Gradients. ArXiv, abs/2407.20230.