Skip to content

Commit 60f1187

Browse files
committed
[Algorithm] Expert Iteration and SFT
1 parent 92b52a0 commit 60f1187

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+4206
-126
lines changed

docs/source/reference/llms.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,7 @@ transforms).
200200

201201
DataLoadingPrimer
202202
KLRewardTransform
203+
RetrieveLogProb
203204
MCPToolTransform
204205
BrowserTransform
205206
PythonInterpreter
@@ -256,6 +257,9 @@ LLM post training require some appropriate versions of the losses implemented in
256257
GRPO
257258
~~~~
258259

260+
The :class:`~torchrl.objectives.llm.GRPOLoss` class is a thin wrapper around the :class:`~torchrl.objectives.PPOLoss` class
261+
that codes the LLM-specific functionnalities.
262+
259263
.. currentmodule:: torchrl.objectives.llm
260264

261265
.. autosummary::
@@ -265,3 +269,24 @@ GRPO
265269
GRPOLoss
266270
GRPOLossOutput
267271
MCAdvantage
272+
273+
274+
SFT
275+
~~~
276+
277+
.. currentmodule:: torchrl.objectives.llm
278+
279+
.. autosummary::
280+
:toctree: generated/
281+
:template: rl_template.rst
282+
283+
SFTLoss
284+
SFTLossOutput
285+
286+
.. currentmodule:: torchrl.data.llm
287+
288+
.. autosummary::
289+
:toctree: generated/
290+
:template: rl_template.rst
291+
292+
TopKRewardSelector

0 commit comments

Comments
 (0)