Skip to content

Chapter 6 changes marcos #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Clustering/Clustering.tex
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ \section{K-Means Clustering}
\item Initialize cluster centers by randomly selecting points in our data set.
\item Using a distance metric of your choosing, assign each data point to the closest cluster.
\item Update the cluster centers based on your assignments and distance metric (for example, when using L2 distance, we update the cluster centers by averaging the data points assigned to each cluster).
\item Repeat steps 1 and 2 until convergence.
\item Repeat steps 2 and 3 until convergence.
\end{enumerate}

In the case where we are using the L2 distance metric, this is known as \textit{Lloyd's algorithm}, which we derive in the next section.
Expand Down
2 changes: 1 addition & 1 deletion ReinforcementLearning/ReinforcementLearning.tex
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ \section{Model-Free Learning}
\end{equation}
Note, that $V^*(s') = max_{a'}Q^*(s', a')$ since the highest value achievable from state $s'$ following policy $*$ is the Q value of taking the optimal action from $s'$. Substituting this in, we get the following \textit{Bellman Equation}:
\begin{equation}
Q^*(s, a) = r(s, a) + \gamma\sum_s'p(s'|s, a)\max_{a'}Q^*(s', a)
Q^*(s, a) = r(s, a) + \gamma\sum_{s'}p(s'|s, a)\max_{a'}Q^*(s', a')
\end{equation}
Note that we can't directly calculate the term $\gamma\sum_s'p(s'|s, a)\max_a'Q^*(s', a)$ since we don't know $p(s'|s, a)$. We will discuss how this is addressed by the two algorithms we will cover in the value-based family.
\subsection{SARSA and Q-Learning}
Expand Down