diff --git a/README.md b/README.md index 61fe51bc3..4c4f48013 100644 --- a/README.md +++ b/README.md @@ -27,20 +27,34 @@ All code is written in Python 3 and uses RL environments from [OpenAI Gym](https ### List of Implemented Algorithms - [Dynamic Programming Policy Evaluation](DP/Policy%20Evaluation%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDP%2FPolicy%2520Evaluation%2520Solution.ipynb) - [Dynamic Programming Policy Iteration](DP/Policy%20Iteration%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDP%2FPolicy%2520Iteration%2520Solution.ipynb) - [Dynamic Programming Value Iteration](DP/Value%20Iteration%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDP%2FValue%2520Iteration%2520Solution.ipynb) - [Monte Carlo Prediction](MC/MC%20Prediction%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FMC%2FMC%2520Prediction%2520Solution.ipynb) - [Monte Carlo Control with Epsilon-Greedy Policies](MC/MC%20Control%20with%20Epsilon-Greedy%20Policies%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FMC%2FMC%2520Control%2520with%2520Epsilon-Greedy%2520Policies%2520Solution.ipynb) - [Monte Carlo Off-Policy Control with Importance Sampling](MC/Off-Policy%20MC%20Control%20with%20Weighted%20Importance%20Sampling%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FMC%2FOff-Policy%2520MC%2520Control%2520with%2520Weighted%2520Importance%2520Sampling%2520Solution.ipynb) - [SARSA (On Policy TD Learning)](TD/SARSA%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FTD%2FSARSA%2520Solution.ipynb) - [Q-Learning (Off Policy TD Learning)](TD/Q-Learning%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FTD%2FQ-Learning%2520Solution.ipynb) - [Q-Learning with Linear Function Approximation](FA/Q-Learning%20with%20Value%20Function%20Approximation%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FFA%2FQ-Learning%2520with%2520Value%2520Function%2520Approximation%2520Solution.ipynb) - [Deep Q-Learning for Atari Games](DQN/Deep%20Q%20Learning%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDQN%2FDeep%2520Q%2520Learning%2520Solution.ipynb) - [Double Deep-Q Learning for Atari Games](DQN/Double%20DQN%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FDQN%2FDouble%2520DQN%2520Solution.ipynb) - Deep Q-Learning with Prioritized Experience Replay (WIP) - [Policy Gradient: REINFORCE with Baseline](PolicyGradient/CliffWalk%20REINFORCE%20with%20Baseline%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FPolicyGradient%2FCliffWalk%2520REINFORCE%2520with%2520Baseline%2520Solution.ipynb) - [Policy Gradient: Actor Critic with Baseline](PolicyGradient/CliffWalk%20Actor%20Critic%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FPolicyGradient%2FCliffWalk%2520Actor%2520Critic%2520Solution.ipynb) - [Policy Gradient: Actor Critic with Baseline for Continuous Action Spaces](PolicyGradient/Continuous%20MountainCar%20Actor%20Critic%20Solution.ipynb) + [](https://beta.deepnote.org/launch?template=data-science&url=https%3A%2F%2Fgithub.com%2Fdennybritz%2Freinforcement-learning%2Fblob%2Fmaster%2FPolicyGradient%2FContinuous%2520MountainCar%2520Actor%2520Critic%2520Solution.ipynb) - Deterministic Policy Gradients for Continuous Action Spaces (WIP) - Deep Deterministic Policy Gradients (DDPG) (WIP) - [Asynchronous Advantage Actor Critic (A3C)](PolicyGradient/a3c)