GitHub - thomas475/Uni-RLHF-Platform: We extended the Uni-RLHF-Platform to support the Franka kitchen domain. Furthermore, we fixed small issues with the implementation.

Project Website · Paper · Datasets · Clean Offline RLHF

This is the Uni-RLHF platform implementation of the paper Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback by Yifu Yuan, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu, Zhixin Feng, Kai Zhao, Yan Zheng with contributions by Thomas Frank. Uni-RLHF aims to provide a complete workflow from real human feedback, fostering progress in the development of RLHF in decision making domain. Here we develops a user-friendly annotation interface tailored to various feedback types, compatible with a wide range of mainstream RL environments. We then establish a systematic pipeline of crowdsourced annotations, resulting in large-scale annotated dataset (≈15 million steps). Also, we provide offline RLHF baselines using collected feedback datasets and various design choice in the Clean Offline RLHF.

Table of Contents

Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

🛠️ Getting Started

The Uni-RLHF platform consists of a vue front-end and a flask back-end. Also, we support a wide range of mainstream RL environments for annotation. The system only works on Linux.

Installation on Linux (Ubuntu)

Clone the repo

git clone https://github.com/thomas475/Uni-RLHF.git
cd Uni-RLHF

Setup Anaconda environment

conda create -n rlhf python==3.9
conda activate rlhf

Install Dependencies
```
pip install -r requirements.txt
```
Install NPM packages
```
cd uni_rlhf/vue_part
npm install
```
Install hdf5
```
conda install anaconda::hdf5
```
Install Redis
```
sudo apt-get install redis
```
Configure the MySQL Database (adjust [user] and [password] accordingly)
```
mysql -u [user] -p
```
Then, in the MySQL environment enter:
```
CREATE DATABASE uni_rlhf;
exit
```
Afterwards, navigate to scripts/create_table.py and update the cfg variable:
```
cfg = {
    'host': 'localhost',
    'port': 3306,
    'username': [user],
    'password': [password],
    'database_name': 'uni_rlhf'
}
```
Then, execute the script:
```
cd scripts
py create_table.py
```
Finally, go to uni_rlhf/config.py and set app.config['SQLALCHEMY_DATABASE_URI'] to 'mysql://[user]:[password]@localhost/uni_rlhf'.

MuJoCo

Many of the datasets use MuJoCo as environment, so it should be installed, too. See this for further details.

Download the MuJoCo library:

wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz

Create the MuJoCo folder:
```
mkdir ~/.mujoco
```

Extract the library to the MuJoCo folder:

tar -xvf mujoco210-linux-x86_64.tar.gz -C ~/.mujoco/

Add environment variables (run nano ~/.bashrc):

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
export MUJOCO_GL=egl

Reload the .bashrc file to register the changes.
```
source ~/.bashrc
```

Install dependencies:

conda install -c conda-forge patchelf fasteners cython==0.29.37 cffi pyglfw libllvm11 imageio glew glfw mesalib
sudo apt-get install libglew-dev

Test that the library is installed.

cd ~/.mujoco/mujoco210/bin
./simulate ../model/humanoid.xml

Datasets

Uni-RLHF supports the following classic datasets, a full list of all tasks is available here. Uni-RLHF also supports the uploading of customizaton datasets, as long as the dataset contains observations and terminals keys.

Install D4RL dependencies. Note that we adapted the code to make the kitchen tasks work and made some small changes to the camera view for better visualisations.
```
cd d4rl
pip install -e .
```

Install Atari dependencies.

pip install git+https://github.com/takuseno/d4rl-atari

Install V-D4RL dependencies. Note that v-d4rl provide image datasets and full datasets can be found on GoogleDrive. These must be downloaded before running the code. And the right file structure is:

 uni_rlhf
 └───datasets
 │   └───dataset_resource
 |       └───vd4rl
 |       |   └───cheetah
 |       |   │   └───cheetah_run_medium
 |       |   │   └───cheetah_run_medium_expert
 |       |   └───humanoid
 |       |   |   |───humanoid_walk_medium
 |       |   │   └───humanoid_walk_medium_expert
 |       |   └───walker
 |       |       |───walker_walk_medium
 |       |       └───walker_walk_medium_expert
 |       └───smarts
 |          └───cruise
 |          └───curin
 |          └───left_c
 └───vue_part
 │   ...
 └───controllers
 │   ...

Install MiniGrid dependencies. There are the same dependencies as the D4RL datasets.
Install SMARTS dependencies. We employed online reinforcement learning algorithms to train two agents for datasets collection, each designed specifically for the respective scenario. The first agent demonstrates medium driving proficiency, achieving a success rate ranging from 40% to 80% in its designated scenario. In contrast, the second agent exhibits expert-level performance, attaining a success rate of 95% or higher in the same scenario. For dataset construction, 800 driving trajectories were collected using the intermediate agent, while an additional 200 were gathered via the expert agent. By integrating the two datasets, we compiled a mixed dataset encompassing 1,000 driving trajectories. We upload full datasets containing image (for rendering) and vector (for training) on GoogleDrive. These must be downloaded before running the code. And the right file structure is the same as v-d4rl dataset.

Upload customization datasets. The customization datasets must be h5df format and contain observations and terminal keys:

observations: An N by observation dimensional array of observations.
terminals: An N dimensional array of episode termination flags.

(back to top)

Running the Platform

MySQL

Start the MySQL server:

sudo systemctl start mysql.service
mysql -u [user] -p

Redis

Start the redis server:

sudo service redis-server stop
redis-server

Running the App

Now you can run the app from the base directory with:

conda activate rlhf
python run.py

The app is running at:

http://localhost:8503

You can kill all relative processes with:

python scripts/kill_process.py

Running the App remotely

If you run the application remotely, you can access the interface through either port forwarding or public access. Make sure that port 8502 and 8503 are open on the server.

Port Forwarding

Open the terminal on your local machine and enter the following (adjust [user] and [server-ip] accordingly): ssh -L 8502:localhost:8502 -L 8503:localhost:8503 [user]@[server-ip]

Then you can access the interface at http://localhost:8503.

Public Access

You can also let the application be publically accessible through its ip address. For this you need to modify the following variables before running the platform (adjust [server-ip] accordingly):

in uni_rlhf/config.json set the value of baseUrl to [server-ip]
in uni_rlhf/vue_part/src/store/index.js in the json initialState set the value of baseUrl to http://[server-ip]:8502

Then you can access the interface at http://[server-ip]:8503.

NOTE: If [server-ip] is an ipv6-address and you use it in an http-request, you need to put brackets around it. For example, if the ipv6-address is 0:0:0:0:0:0:0:1 the corresponding http-request would be http://[0:0:0:0:0:0:0:1]:8502.

💻 Usage

Overview

Specially tailored pipelines and tasks for reinforcement learning and decision-making problem.
A clean pipeline designed for employer-annotators coordination
Supports multi-user synchronised labeling and export with no conflict.
Supports a large number of mainstream decision-making datasets and easily cumstomize and upload your own datasets.
Supports serveral mainstream feedback types for decision making problem and provide configurable label formats let you combining new ways of giving feedback.

Supported Tasks

We support serveral build-in environments and datasets. See config for expected name formatting for full domains and tasks.

Supported Feedbacks Format

We support five common feedback types, propose a standardized feedback encoding format how annotators interact with these types and how they can be encoded. Additionally, we briefly outline the potential forms and applications of reinforcement learning that integrate various forms of human feedback in the Uni-RLHF paper.

Offline RLHF Datasets and Benchmark

Thanks to Uni-RLHF, we establish a systematic pipeline of crowdsourced annotations, resulting in an open-source and reuseable large-scale annotated dataset (≈15 million steps). Then, we conduct offline RL baselines using collected feedback datasets, we refer to offline RLHF baselines in the sister repository. We wish to build valuable open-source platforms, datasets, and baselines to facilitate the development of more robust and reliable RLHF solutions for decision making based on realistic human feedback.

For more examples, please refer to the Documentation

(back to top)

🧭 Roadmap

Support auto reward model training process
Fix online training bug
Adapting the sampler in the new code framework

See the open issues for a full list of proposed features (and known issues).

(back to top)

🙏 Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

🏷️ License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

✉️ Contact

For any questions, please feel free to email [email protected].

(back to top)

📝 Citation

If you find our work useful, please consider citing:

@inproceedings{anonymous2023unirlhf,
    title={Uni-{RLHF}: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback},
    author={Yuan, Yifu and Hao, Jianye and Ma, Yi and Dong, Zibin and Liang, Hebin and Liu, Jinyi and Feng, Zhixin and Zhao, Kai and Zheng, Yan}
    booktitle={The Twelfth International Conference on Learning Representations, ICLR},
    year={2024},
    url={https://openreview.net/forum?id=WesY0H9ghM},
}

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
d4rl		d4rl
scripts		scripts
uni_rlhf		uni_rlhf
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
__init__.py		__init__.py
legacy_requirements.txt		legacy_requirements.txt
requirements.txt		requirements.txt
run.py		run.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛠️ Getting Started

Installation on Linux (Ubuntu)

MuJoCo

Datasets

Running the Platform

MySQL

Redis

Running the App

Running the App remotely

Port Forwarding

Public Access

💻 Usage

Overview

Supported Tasks

Supported Feedbacks Format

Offline RLHF Datasets and Benchmark

🧭 Roadmap

🙏 Contributing

🏷️ License

✉️ Contact

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

thomas475/Uni-RLHF-Platform

Folders and files

Latest commit

History

Repository files navigation

🛠️ Getting Started

Installation on Linux (Ubuntu)

MuJoCo

Datasets

Running the Platform

MySQL

Redis

Running the App

Running the App remotely

Port Forwarding

Public Access

💻 Usage

Overview

Supported Tasks

Supported Feedbacks Format

Offline RLHF Datasets and Benchmark

🧭 Roadmap

🙏 Contributing

🏷️ License

✉️ Contact

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages