diff --git a/Clustering/Clustering.tex b/Clustering/Clustering.tex index 6b12635..7688f51 100644 --- a/Clustering/Clustering.tex +++ b/Clustering/Clustering.tex @@ -46,7 +46,7 @@ \section{K-Means Clustering} \item Initialize cluster centers by randomly selecting points in our data set. \item Using a distance metric of your choosing, assign each data point to the closest cluster. \item Update the cluster centers based on your assignments and distance metric (for example, when using L2 distance, we update the cluster centers by averaging the data points assigned to each cluster). - \item Repeat steps 1 and 2 until convergence. + \item Repeat steps 2 and 3 until convergence. \end{enumerate} In the case where we are using the L2 distance metric, this is known as \textit{Lloyd's algorithm}, which we derive in the next section. diff --git a/ReinforcementLearning/ReinforcementLearning.tex b/ReinforcementLearning/ReinforcementLearning.tex index 093fdfe..8618206 100644 --- a/ReinforcementLearning/ReinforcementLearning.tex +++ b/ReinforcementLearning/ReinforcementLearning.tex @@ -36,7 +36,7 @@ \section{Model-Free Learning} \end{equation} Note, that $V^*(s') = max_{a'}Q^*(s', a')$ since the highest value achievable from state $s'$ following policy $*$ is the Q value of taking the optimal action from $s'$. Substituting this in, we get the following \textit{Bellman Equation}: \begin{equation} - Q^*(s, a) = r(s, a) + \gamma\sum_s'p(s'|s, a)\max_{a'}Q^*(s', a) + Q^*(s, a) = r(s, a) + \gamma\sum_{s'}p(s'|s, a)\max_{a'}Q^*(s', a') \end{equation} Note that we can't directly calculate the term $\gamma\sum_s'p(s'|s, a)\max_a'Q^*(s', a)$ since we don't know $p(s'|s, a)$. We will discuss how this is addressed by the two algorithms we will cover in the value-based family. \subsection{SARSA and Q-Learning}