Welcome to the Nano-R1 repository! This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using Generalized Reward Policy Optimization (GRPO) on the GSM8K dataset. This README provides all the information you need to get started with the project.
In the realm of natural language processing, fine-tuning models for specific tasks is essential for achieving high performance. The Qwen2.5-3B-Instruct model is designed to understand and generate human-like text. This project focuses on fine-tuning this model using GRPO to enhance its performance on the GSM8K dataset, which consists of various mathematical problems.
To get started with the Nano-R1 project, you will need to clone the repository and install the necessary dependencies. Follow the steps below to set up your environment.
- Python 3.7 or higher
- pip (Python package installer)
- Git
You can clone the repository using the following command:
git clone https://github.com/Mikesterner87/Nano-R1.git
cd Nano-R1
After cloning the repository, install the required packages:
pip install -r requirements.txt
- Fine-tuning of the Qwen2.5-3B-Instruct model.
- Implementation of GRPO for effective training.
- Utilization of the GSM8K dataset for model evaluation.
- Support for various adapters and configurations.
- Easy integration with Hugging Face libraries.
To run the project, you need to install the required libraries. The requirements.txt
file contains all necessary dependencies. Use the following command to install them:
pip install -r requirements.txt
Make sure to have the following libraries installed:
transformers
torch
safetensors
trl
text-generation-inference
Once you have set up the environment, you can start fine-tuning the model. Use the following command to begin the training process:
python train.py --dataset gsm8k --model qwen2-5
You can adjust parameters in the train.py
script to customize your training process. Refer to the comments in the code for guidance.
Here’s a simple example of how to use the fine-tuned model for text generation:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "your_fine_tuned_model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
input_text = "What is 7 + 5?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
This code snippet demonstrates how to load your fine-tuned model and generate text based on a prompt.
We welcome contributions to the Nano-R1 project. If you would like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature
). - Make your changes.
- Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature/YourFeature
). - Open a pull request.
Please ensure that your code adheres to the existing style and includes tests where applicable.
This project is licensed under the MIT License. See the LICENSE file for details.
You can find the latest releases and downloadable files here. Please download and execute the files as needed to get the most out of this project.
For detailed information about each release, visit the Releases section.
For any questions or feedback, feel free to reach out:
- GitHub: Mikesterner87
- Email: [email protected]
Thank you for your interest in the Nano-R1 project! We hope you find it useful for your fine-tuning tasks.