Jason Yecheng Ma1, Jason Yan1, Dinesh Jayaraman1, Osbert Bastani1
1University of Pennsylvania
This is a PyTorch implementation of our paper How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via F-Advantage Regression; this code can be used to reproduce Section 5.1 and 5.2 of the paper.
Here is a teaser video comparing GoFAR against state-of-art offline GCRL algorithms on a real robot!

- MuJoCo=2.0.0
 
- Create conda environment and activate it:
conda env create -f environment.yml conda activate gofar pip install --upgrade numpy pip install torch==1.10.0 torchvision==0.11.1 torchaudio===0.10.0 gym==0.17.3 - (Optionally) install the Robel environment for the D'Claw experiment.
 - Download the offline dataset here and place 
/offline_datain the project root directory. 
We provide commands for reproducing the main GCRL results (Table 1), the ablations (Figure 3), and the stochastic offline GCRL experiment (Figure 4).
- The main results (Table 1) can be reproduced by the following command:
 
mpirun -np 1 python train.py --env $ENV --method $METHOD
| Flags and Parameters | Description | 
|---|---|
--env $ENV | 
offline GCRL tasks: FetchReach, FetchPush, FetchPick, FetchSlide, HandReach, DClawTurn | 
--method $METHOD | 
offline GCRL algorithms: gofar, gcsl, wgcsl, actionablemodel, ddpg | 
- To run the ablations (Figure 3), we can adjust some relevant command arguments. For example, to disable HER, we can do
 
mpirun -np 1 python train.py --env $ENV --method $METHOD --relabel False
Note that gofar defaults to not using HER, so this command is only relevant to the baselines. Relevant flags are listed here:
| Flags and Parameters | Description | 
|---|---|
--relabel | 
whether hindsight experience replay is enabled: True, False   | 
--relabel_percent | 
The fraction of minibatch transitions that has relabeled goals: 0.0, 0.2, 0.5, 1.0; these are the hyperparameters attempted in the paper, you may try other fractions too. | 
--f | 
Choices of f-divergence for GoFAR: kl, chi. | 
--reward_type | 
Choices of reward function for GoFAR: disc, binary. | 
- The following command will run the stochastic environment experiment (Figure 4):
 
mpirun -np 1 python train.py --env FetchReach --method $METHOD --noise True --noise-eps $NOISE_EPS
where $NOISE_EPS can be chosen from 0.5, 1.0, 1.5.
We borrowed some code from the following repositories: