GitHub - epicgamer17/rl-research: some rl research and paper recreations

Name		Name	Last commit message	Last commit date
Latest commit History 1,090 Commits
RL_RESEARCH.egg-info		RL_RESEARCH.egg-info
base_agent		base_agent
custom_gym_envs		custom_gym_envs
deprecated		deprecated
dqn		dqn
elo		elo
experiments		experiments
imitation_learning		imitation_learning
modules		modules
muzero		muzero
packages		packages
ppo		ppo
readme-figs		readme-figs
replay_buffers		replay_buffers
search		search
storage		storage
vectorhash		vectorhash
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
ffmpeg		ffmpeg
papers.txt		papers.txt
pylint_output.txt		pylint_output.txt
readme1.ipynb		readme1.ipynb
readme1.md		readme1.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

RL recreations and implementation from papers

Includes:

DQN
Double DQN
Prioritized Experience Replay DQN
Deuling DQN
Noisy DQN
N Step DQN
Categorical DQN
Rainbow DQN These can be used by passing in the correct config into RainbowAgent (You can also mix and match these by creating your own configs)
Ape-X
Neural Ficticious Self Play (NFSP) NFSP allows traditional RL agents to work well on imperfect information games and multi agent environments. It also can be used to train Rainbow on multi agent games by passing in an anticipatory param of 1.0, this should really only be used for deterministic games though, like Tic Tac Toe or Connect 4.
PPO
AlphaZero
MuZero

Envs we have implimented:

Tic Tac Toe
CartPole
Connect 4
Mississippi Marbles
LeDuc Holdem

Some envs we want to test in the future:

Chess
Catan
Go
Shogi
Risk
Monopoly
Starcraft
Clash Royale
RL Card (Card Games): https://rlcard.org/ https://github.com/datamllab/rlcard Black Jack Leduc Hold'em Limit Texas Hold'em Dou Dizhu Simple Dou Dizhu Mahjong No-limit Texas Hold'em UNO Gin Rummy Bridge
Eclipse Sumo (Traffic Simulation): https://eclipse.dev/sumo/about/ https://github.com/AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control
Any Trading (Simple): https://github.com/AminHP/gym-anytrading
MTSIM Trading (Complex): https://github.com/AminHP/gym-mtsim
TensorTrade: https://www.tensortrade.org/en/latest/examples/train_and_evaluate_using_ray.html https://github.com/tensortrade-org/tensortrade?tab=readme-ov-file
Atari 57: https://gymnasium.farama.org/environments/atari/
MineCraft: https://minerl.io/
Racing: https://aws.amazon.com/deepracer/
Robo Sumo: https://github.com/openai/robosumo
Unity ML Agents: https://github.com/Unity-Technologies/ml-agents
Multi Agent Emergence Environements: https://github.com/openai/multi-agent-emergence-environments/tree/master/examples
All Open AI Gym Environments: https://gymnasium.farama.org/ Classic Control Box 2D Toy Text MuJoCo Atari
All Open Spiel Environments: https://github.com/google-deepmind/open_spiel?tab=readme-ov-file More at: https://github.com/clvrai/awesome-rl-envs?tab=readme-ov-file

Tournaments/Challenges:

Battle Snake: https://play.battlesnake.com/
_Terminal: https://terminal.c1games.com/
Lux AI: https://www.kaggle.com/c/lux-ai-2021
Russian AI Cup: https://russianaicup.ru/
Coliseum: https://www.coliseum.ai/
Code Cup: https://www.codecup.nl/intro.php
IEEE Conference on Games: https://2023.ieee-cog.org/

Some useful papers:

Muzero: https://arxiv.org/pdf/1911.08265.pdf
Rainbow: https://arxiv.org/pdf/1710.02298.pdf
Revisiting Rainbow: https://arxiv.org/pdf/2011.14826.pdf
AlphaZero: https://arxiv.org/pdf/1712.01815.pdf
Policy Value Alignment: https://arxiv.org/pdf/2301.11857.pdf
A Disciplined Approach to Hyperparameters Part 1: https://arxiv.org/pdf/1803.09820.pdf
High Performance Algorithms for Turn Based Games Using Deep Learning: https://www.scitepress.org/Papers/2020/89561/89561.pdf
KataGo: https://arxiv.org/pdf/2008.10080.pdf https://github.com/lightvector/KataGo/tree/master
Never Give Up: https://arxiv.org/pdf/2002.06038.pdf
Agent 57: https://arxiv.org/pdf/2003.13350.pdf
MEME: https://arxiv.org/pdf/2003.13350.pdf
GDI: https://arxiv.org/pdf/2106.06232.pdf <- not used but interesting idea
Prioritized Experience Replay: https://arxiv.org/pdf/1511.05952.pdf
PPO: https://arxiv.org/pdf/1707.06347.pdf
What Matters in On Policy RL: https://arxiv.org/pdf/2006.05990.pdf
Population Based Training: https://arxiv.org/pdf/1711.09846.pdf <- not used but interesting idea for the future
RL Card: https://arxiv.org/abs/1910.04376
NFSP https://arxiv.org/pdf/1603.01121
CFR: https://proceedings.neurips.cc/paper/2007/file/08d98638c6fcd194a4b1e6992063e944-Paper.pdf 20: Deep CFR: https://arxiv.org/pdf/1811.00164

To Look Into:

Muesli
DreamerV3
R2D2
NGU
Agent 57
CFR (For imperfect information)
DeepCFR (For imperfect information)
StarCraft League
Meta Learning
World Models