From 3a0997d922c8476c0495ff6a2e01c4c4661b202b Mon Sep 17 00:00:00 2001 From: Marcos Johnson-Noya Date: Fri, 16 May 2025 16:15:15 -0600 Subject: [PATCH 1/2] fixed typos --- ReinforcementLearning/ReinforcementLearning.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ReinforcementLearning/ReinforcementLearning.tex b/ReinforcementLearning/ReinforcementLearning.tex index 093fdfe..8618206 100644 --- a/ReinforcementLearning/ReinforcementLearning.tex +++ b/ReinforcementLearning/ReinforcementLearning.tex @@ -36,7 +36,7 @@ \section{Model-Free Learning} \end{equation} Note, that $V^*(s') = max_{a'}Q^*(s', a')$ since the highest value achievable from state $s'$ following policy $*$ is the Q value of taking the optimal action from $s'$. Substituting this in, we get the following \textit{Bellman Equation}: \begin{equation} - Q^*(s, a) = r(s, a) + \gamma\sum_s'p(s'|s, a)\max_{a'}Q^*(s', a) + Q^*(s, a) = r(s, a) + \gamma\sum_{s'}p(s'|s, a)\max_{a'}Q^*(s', a') \end{equation} Note that we can't directly calculate the term $\gamma\sum_s'p(s'|s, a)\max_a'Q^*(s', a)$ since we don't know $p(s'|s, a)$. We will discuss how this is addressed by the two algorithms we will cover in the value-based family. \subsection{SARSA and Q-Learning} From d3a96fb3297da6440eae616f47bb0349e3c67292 Mon Sep 17 00:00:00 2001 From: Marcos Johnson-Noya Date: Sat, 17 May 2025 08:17:04 -0600 Subject: [PATCH 2/2] changed steps we repeat for K-means Clustering --- Clustering/Clustering.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Clustering/Clustering.tex b/Clustering/Clustering.tex index 6b12635..7688f51 100644 --- a/Clustering/Clustering.tex +++ b/Clustering/Clustering.tex @@ -46,7 +46,7 @@ \section{K-Means Clustering} \item Initialize cluster centers by randomly selecting points in our data set. \item Using a distance metric of your choosing, assign each data point to the closest cluster. \item Update the cluster centers based on your assignments and distance metric (for example, when using L2 distance, we update the cluster centers by averaging the data points assigned to each cluster). - \item Repeat steps 1 and 2 until convergence. + \item Repeat steps 2 and 3 until convergence. \end{enumerate} In the case where we are using the L2 distance metric, this is known as \textit{Lloyd's algorithm}, which we derive in the next section.