This project focuses on fine-tuning the Qwen2.5 model using two methods:
-
Supervised Fine-Tuning (SFT): Aimed at improving the model's specific task performance using labeled data.
-
Generative Reward-based Policy Optimization (GRPO): Fine-tuning with reinforcement learning techniques to optimize task-specific generation quality.
- Python 3.10
- PyTorch
- Transformers
- Datasets
- TRL (Transformer Reinforcement Learning)
- Wandb
- PEFT
pip install torch transformers datasets wandb peft trl
huggingface-cli login
llamafactory-cli train Llama-factory-config/qwen2.5_lora_sft_train.yaml
llamafactory-cli train Llama-factory-config/qwen2.5_lora_sft_merge_lora.yaml
llamafactory-cli train Llama-factory-config/qwen2.5_lora_sft_inference.yaml
To finetuning the model, use the bash script with the train
mode:
./finetune_grpo_qwen.sh -m train
./finetune_grpo_qwen.sh \
-i "Qwen/Qwen2.5-1.5B-Instruct" \
-d "/path/to/dataset.jsonl" \
-o "output_model_directory" \
-l 1e-5 \
-e 3 \
-b 16 \
-m transformers
-i, --model-id
: Base model or trained model ID-d, --dataset
: Path to the dataset-o, --output-dir
: Output directory for trained model-l, --lr
: Learning rate-e, --epochs
: Number of training epochs-b, --batch-size
: Gradient accumulation steps
Use the provided inference script for batch processing of input files and generating responses.
python inference_script.py <config_path> <label_folder> <prompt> [output_path]
config_path
: Path to the YAML configuration filelabel_folder
: Folder containing input text filesprompt
: Base prompt to useoutput_path
(optional): Path to save results JSON
python inference_script.py \
config.yaml \
/path/to/label/folder \
"Extract information from the following passport:" \
/path/to/output/results.json
The configuration file (YAML) supports:
model_name_or_path
: Model path or HuggingFace IDtemplate
: Prompt template (llama3, llama2, mistral, chatml, qwen, etc.)trust_remote_code
: Whether to trust remote codegeneration_config
: Inference generation parameters
- LLaMA3
- LLaMA2
- Mistral
- ChatML
- Qwen
- DeepSeek
- Zephyr
- Ensure you have a compatible GPU with sufficient memory
- The script uses Wandb for experiment tracking
- The model is pushed to HuggingFace Hub after training
- Check your dataset path
- Ensure all required libraries are installed
- Verify GPU compatibility