Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,34 @@ All code is written in Python 3 and uses RL environments from [OpenAI Gym](https
### List of Implemented Algorithms

- [Dynamic Programming Policy Evaluation](DP/Policy%20Evaluation%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDP%2FPolicy%2520Evaluation%2520Solution.ipynb)
- [Dynamic Programming Policy Iteration](DP/Policy%20Iteration%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDP%2FPolicy%2520Iteration%2520Solution.ipynb)
- [Dynamic Programming Value Iteration](DP/Value%20Iteration%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDP%2FValue%2520Iteration%2520Solution.ipynb)
- [Monte Carlo Prediction](MC/MC%20Prediction%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FMC%2FMC%2520Prediction%2520Solution.ipynb)
- [Monte Carlo Control with Epsilon-Greedy Policies](MC/MC%20Control%20with%20Epsilon-Greedy%20Policies%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FMC%2FMC%2520Control%2520with%2520Epsilon-Greedy%2520Policies%2520Solution.ipynb)
- [Monte Carlo Off-Policy Control with Importance Sampling](MC/Off-Policy%20MC%20Control%20with%20Weighted%20Importance%20Sampling%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FMC%2FOff-Policy%2520MC%2520Control%2520with%2520Weighted%2520Importance%2520Sampling%2520Solution.ipynb)
- [SARSA (On Policy TD Learning)](TD/SARSA%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FTD%2FSARSA%2520Solution.ipynb)
- [Q-Learning (Off Policy TD Learning)](TD/Q-Learning%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FTD%2FQ-Learning%2520Solution.ipynb)
- [Q-Learning with Linear Function Approximation](FA/Q-Learning%20with%20Value%20Function%20Approximation%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FFA%2FQ-Learning%2520with%2520Value%2520Function%2520Approximation%2520Solution.ipynb)
- [Deep Q-Learning for Atari Games](DQN/Deep%20Q%20Learning%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDQN%2FDeep%2520Q%2520Learning%2520Solution.ipynb)
- [Double Deep-Q Learning for Atari Games](DQN/Double%20DQN%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDQN%2FDouble%2520DQN%2520Solution.ipynb)
- Deep Q-Learning with Prioritized Experience Replay (WIP)
- [Policy Gradient: REINFORCE with Baseline](PolicyGradient/CliffWalk%20REINFORCE%20with%20Baseline%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FPolicyGradient%2FCliffWalk%2520REINFORCE%2520with%2520Baseline%2520Solution.ipynb)
- [Policy Gradient: Actor Critic with Baseline](PolicyGradient/CliffWalk%20Actor%20Critic%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FPolicyGradient%2FCliffWalk%2520Actor%2520Critic%2520Solution.ipynb)
- [Policy Gradient: Actor Critic with Baseline for Continuous Action Spaces](PolicyGradient/Continuous%20MountainCar%20Actor%20Critic%20Solution.ipynb)
[<img align="right" height="22" src="https://beta.deepnote.org/buttons/launch-in-deepnote.svg">](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FPolicyGradient%2FContinuous%2520MountainCar%2520Actor%2520Critic%2520Solution.ipynb)
- Deterministic Policy Gradients for Continuous Action Spaces (WIP)
- Deep Deterministic Policy Gradients (DDPG) (WIP)
- [Asynchronous Advantage Actor Critic (A3C)](PolicyGradient/a3c)
Expand Down