rlhf-from-scratch

An intro to perform RLHF using PPO and DPO on LLMs from scratch.

I go through the entire rabbit hole and learning curve of RLHF, where it began, all the popular techniques and math behind it, it's applications and the general role in Large langauge models.

I believe this notebook would be helpful to you guys as well as me for referring it as my notes.

Thanks! Feel free to contribute

Start here!

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
assets		assets
src/ppo		src/ppo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
tutorial.ipynb		tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rlhf-from-scratch

About

Uh oh!

Contributors 2

Languages

License

ashworks1706/rlhf-from-scratch

Folders and files

Latest commit

History

Repository files navigation

rlhf-from-scratch

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Languages