Skip to content

A theoretical and practical deep dive into Reinforcement Learning with Human Feedback and it’s applications in Large Language Models from scratch.

License

Notifications You must be signed in to change notification settings

ashworks1706/rlhf-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rlhf-from-scratch

An intro to perform RLHF using PPO and DPO on LLMs from scratch.

I go through the entire rabbit hole and learning curve of RLHF, where it began, all the popular techniques and math behind it, it's applications and the general role in Large langauge models.

I believe this notebook would be helpful to you guys as well as me for referring it as my notes.

Thanks! Feel free to contribute

Start here!

About

A theoretical and practical deep dive into Reinforcement Learning with Human Feedback and it’s applications in Large Language Models from scratch.

Topics

Resources

License

Stars

Watchers

Forks