RL recreations and implementation from papers
Includes:
- DQN
- Double DQN
- Prioritized Experience Replay DQN
- Deuling DQN
- Noisy DQN
- N Step DQN
- Categorical DQN
- Rainbow DQN These can be used by passing in the correct config into RainbowAgent (You can also mix and match these by creating your own configs)
- Ape-X
- Neural Ficticious Self Play (NFSP) NFSP allows traditional RL agents to work well on imperfect information games and multi agent environments. It also can be used to train Rainbow on multi agent games by passing in an anticipatory param of 1.0, this should really only be used for deterministic games though, like Tic Tac Toe or Connect 4.
- PPO
- AlphaZero
- MuZero
Envs we have implimented:
- Tic Tac Toe
- CartPole
- Connect 4
- Mississippi Marbles
- LeDuc Holdem
Some envs we want to test in the future:
- Chess
- Catan
- Go
- Shogi
- Risk
- Monopoly
- Starcraft
- Clash Royale
- RL Card (Card Games): https://rlcard.org/ https://github.com/datamllab/rlcard Black Jack Leduc Hold'em Limit Texas Hold'em Dou Dizhu Simple Dou Dizhu Mahjong No-limit Texas Hold'em UNO Gin Rummy Bridge
- Eclipse Sumo (Traffic Simulation): https://eclipse.dev/sumo/about/ https://github.com/AndreaVidali/Deep-QLearning-Agent-for-Traffic-Signal-Control
- Any Trading (Simple): https://github.com/AminHP/gym-anytrading
- MTSIM Trading (Complex): https://github.com/AminHP/gym-mtsim
- TensorTrade: https://www.tensortrade.org/en/latest/examples/train_and_evaluate_using_ray.html https://github.com/tensortrade-org/tensortrade?tab=readme-ov-file
- Atari 57: https://gymnasium.farama.org/environments/atari/
- MineCraft: https://minerl.io/
- Racing: https://aws.amazon.com/deepracer/
- Robo Sumo: https://github.com/openai/robosumo
- Unity ML Agents: https://github.com/Unity-Technologies/ml-agents
- Multi Agent Emergence Environements: https://github.com/openai/multi-agent-emergence-environments/tree/master/examples
- All Open AI Gym Environments: https://gymnasium.farama.org/ Classic Control Box 2D Toy Text MuJoCo Atari
- All Open Spiel Environments: https://github.com/google-deepmind/open_spiel?tab=readme-ov-file More at: https://github.com/clvrai/awesome-rl-envs?tab=readme-ov-file
Tournaments/Challenges:
- Battle Snake: https://play.battlesnake.com/
-
_Terminal: https://terminal.c1games.com/
- Lux AI: https://www.kaggle.com/c/lux-ai-2021
- Russian AI Cup: https://russianaicup.ru/
- Coliseum: https://www.coliseum.ai/
- Code Cup: https://www.codecup.nl/intro.php
- IEEE Conference on Games: https://2023.ieee-cog.org/
Some useful papers:
- Muzero: https://arxiv.org/pdf/1911.08265.pdf
- Rainbow: https://arxiv.org/pdf/1710.02298.pdf
- Revisiting Rainbow: https://arxiv.org/pdf/2011.14826.pdf
- AlphaZero: https://arxiv.org/pdf/1712.01815.pdf
- Policy Value Alignment: https://arxiv.org/pdf/2301.11857.pdf
- A Disciplined Approach to Hyperparameters Part 1: https://arxiv.org/pdf/1803.09820.pdf
- High Performance Algorithms for Turn Based Games Using Deep Learning: https://www.scitepress.org/Papers/2020/89561/89561.pdf
- KataGo: https://arxiv.org/pdf/2008.10080.pdf https://github.com/lightvector/KataGo/tree/master
- Never Give Up: https://arxiv.org/pdf/2002.06038.pdf
- Agent 57: https://arxiv.org/pdf/2003.13350.pdf
- MEME: https://arxiv.org/pdf/2003.13350.pdf
- GDI: https://arxiv.org/pdf/2106.06232.pdf <- not used but interesting idea
- Prioritized Experience Replay: https://arxiv.org/pdf/1511.05952.pdf
- PPO: https://arxiv.org/pdf/1707.06347.pdf
- What Matters in On Policy RL: https://arxiv.org/pdf/2006.05990.pdf
- Population Based Training: https://arxiv.org/pdf/1711.09846.pdf <- not used but interesting idea for the future
- RL Card: https://arxiv.org/abs/1910.04376
- NFSP https://arxiv.org/pdf/1603.01121
- CFR: https://proceedings.neurips.cc/paper/2007/file/08d98638c6fcd194a4b1e6992063e944-Paper.pdf 20: Deep CFR: https://arxiv.org/pdf/1811.00164
To Look Into:
- Muesli
- DreamerV3
- R2D2
- NGU
- Agent 57
- CFR (For imperfect information)
- DeepCFR (For imperfect information)
- StarCraft League
- Meta Learning
- World Models