Skip to content

YanCotta/post_training_llms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

41 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Post-Training Techniques for Large Language Models

A comprehensive implementation and educational resource for modern post-training techniques that enhance Large Language Model (LLM) capabilities and alignment.

Python 3.8+ License: MIT Code style: black

🎯 Overview

This repository provides production-ready implementations of three key post-training techniques:

  • πŸŽ“ Supervised Fine-Tuning (SFT): Enhance instruction-following capabilities
  • βš–οΈ Direct Preference Optimization (DPO): Align models with human preferences
  • πŸ”„ Online Reinforcement Learning (GRPO): Improve task-specific performance with reward signals

All implementations are based on the DeepLearning.AI "Post-training LLMs" course, enhanced with professional software engineering practices, comprehensive documentation, and extensible architecture.

🌟 Key Features

  • πŸ—οΈ Modular Architecture: Clean, extensible codebase with clear separation of concerns
  • πŸ“š Educational Notebooks: Step-by-step tutorials with detailed explanations
  • ⚑ Production Ready: Professional implementations suitable for real-world applications
  • πŸ”§ Easy Configuration: YAML-based configuration for all training parameters
  • πŸ“Š Comprehensive Evaluation: Built-in metrics and benchmarking tools
  • πŸš€ Multiple Interfaces: Command-line scripts, Python API, and Jupyter notebooks
  • πŸŽ›οΈ Flexible Models: Support for various model architectures and sizes

πŸ“ Repository Structure

post_training_llms/
β”œβ”€β”€ src/                          # Core implementation
β”‚   β”œβ”€β”€ utils/                    # Utility functions
β”‚   β”‚   β”œβ”€β”€ model_utils.py       # Model loading, generation, evaluation
β”‚   β”‚   β”œβ”€β”€ data_utils.py        # Dataset preparation and processing
β”‚   β”‚   β”œβ”€β”€ config.py            # Unified configuration system
β”‚   β”‚   └── config_manager.py    # Configuration management utilities
β”‚   β”œβ”€β”€ training/                # Training pipelines
β”‚   β”‚   β”œβ”€β”€ sft_trainer.py       # Supervised Fine-Tuning
β”‚   β”‚   β”œβ”€β”€ dpo_trainer.py       # Direct Preference Optimization
β”‚   β”‚   └── rl_trainer.py        # Online RL with GRPO
β”‚   └── evaluation/              # Evaluation and metrics
β”‚       β”œβ”€β”€ metrics.py           # Performance metrics
β”‚       └── benchmark.py         # Comprehensive benchmarking
β”œβ”€β”€ notebooks/                   # Educational tutorials
β”‚   β”œβ”€β”€ 01_supervised_fine_tuning.ipynb
β”‚   β”œβ”€β”€ 02_direct_preference_optimization.ipynb
β”‚   └── 03_online_reinforcement_learning.ipynb
β”œβ”€β”€ examples/                    # Example scripts
β”‚   β”œβ”€β”€ run_sft.py              # SFT training example
β”‚   β”œβ”€β”€ run_dpo.py              # DPO training example
β”‚   β”œβ”€β”€ run_rl.py               # RL training example
β”‚   β”œβ”€β”€ run_benchmark.py        # Model evaluation
β”‚   └── config_utils.py         # Configuration utilities
β”œβ”€β”€ configs/                     # Configuration files
β”‚   β”œβ”€β”€ sft_config.yaml         # SFT parameters
β”‚   β”œβ”€β”€ dpo_config.yaml         # DPO parameters
β”‚   └── rl_config.yaml          # RL parameters
β”œβ”€β”€ data/                        # Data storage (created at runtime)
└── models/                      # Model storage (created at runtime)

βš™οΈ Configuration System Architecture

The unified configuration system provides a robust, type-safe way to manage all training parameters:

Core Components

  • BaseConfig: Abstract base class with common configuration fields
  • SFTConfig: Configuration for Supervised Fine-Tuning
  • DPOConfig: Configuration for Direct Preference Optimization
  • RLConfig: Configuration for Reinforcement Learning
  • ConfigManager: Utility class for configuration operations

Key Features

  • πŸ”’ Type Safety: All configurations use Python dataclasses with validation
  • βœ… Data Validation: Automatic validation of parameter types and ranges
  • πŸ”„ Inheritance: Method-specific configs inherit from base configuration
  • πŸ“ YAML Support: Load/save configurations in human-readable YAML format
  • πŸŽ›οΈ Command Overrides: Command-line arguments can override config values
  • πŸ”§ Utility Functions: Built-in tools for validation, merging, and conversion

Configuration Structure

# Example configuration hierarchy
BaseConfig
β”œβ”€β”€ ModelConfig          # Model settings (name, trust_remote_code)
β”œβ”€β”€ TrainingConfig      # Common training parameters
β”‚   β”œβ”€β”€ SFTTrainingConfig
β”‚   β”œβ”€β”€ DPOTrainingConfig (with beta parameter)
β”‚   └── RLTrainingConfig (with num_generations)
β”œβ”€β”€ DatasetConfig       # Dataset settings
β”œβ”€β”€ HardwareConfig      # Hardware settings (GPU, mixed precision)
β”œβ”€β”€ OutputConfig        # Output settings
└── EvaluationConfig    # Evaluation settings

πŸš€ Quick Start

Installation

  1. Clone the repository:

    git clone https://github.com/YanCotta/post_training_llms.git
    cd post_training_llms
  2. Install dependencies:

    pip install -r requirements.txt
  3. Verify installation:

    python -c "import torch; import transformers; import datasets; import trl; print('βœ… All dependencies installed successfully!')"

Configuration Management

The project now features a unified configuration system that eliminates code duplication and ensures consistency across all training methods.

Using Configuration Files

All training scripts now support configuration files with command-line overrides:

# Use configuration file with overrides
python examples/run_sft.py \
    --config configs/sft_config.yaml \
    --learning-rate 1e-4 \
    --epochs 2

Configuration Utilities

Use the configuration utility script for common operations:

# Create a new configuration template
python examples/config_utils.py create --type sft --output configs/my_config.yaml

# Validate a configuration file
python examples/config_utils.py validate --config configs/sft_config.yaml

# List all available configurations
python examples/config_utils.py list --directory configs

# Convert configuration to training arguments
python examples/config_utils.py convert --config configs/sft_config.yaml

Testing the Configuration System

The configuration system includes comprehensive testing:

# Test all configuration functionality
python -c "
from src.utils.config import create_default_config
from src.utils.config_manager import ConfigManager

# Create and validate configurations
sft_config = create_default_config('sft')
is_valid = ConfigManager.validate_config(sft_config)
print(f'Configuration system working: {is_valid}')
"

Running Your First Training

Supervised Fine-Tuning (SFT)

python examples/run_sft.py \
    --config configs/sft_config.yaml \
    --max-samples 100

Direct Preference Optimization (DPO)

python examples/run_dpo.py \
    --config configs/dpo_config.yaml \
    --new-identity "My Assistant" \
    --max-samples 50

Online Reinforcement Learning (GRPO)

python examples/run_rl.py \
    --model "HuggingFaceTB/SmolLM2-135M-Instruct" \
    --dataset "openai/gsm8k" \
    --max-train-samples 20 \
    --max-eval-samples 10 \
    --output-dir "./models/my_rl_model"

πŸ“– Tutorials

Interactive Jupyter Notebooks

Explore the techniques through our comprehensive tutorial notebooks:

  1. Supervised Fine-Tuning Tutorial

    • Learn how SFT improves instruction-following
    • Hands-on training with real datasets
    • Performance evaluation and analysis
  2. Direct Preference Optimization Tutorial

    • Understand preference-based training
    • Identity modification example
    • Consistency measurement and evaluation
  3. Online Reinforcement Learning Tutorial

    • Reward-based model improvement
    • Mathematical reasoning enhancement
    • GRPO training and evaluation

Running Notebooks

jupyter notebook notebooks/

πŸŽ›οΈ Configuration

All training parameters can be customized using YAML configuration files:

SFT Configuration (configs/sft_config.yaml)

model:
  name: "HuggingFaceTB/SmolLM2-135M"
training:
  learning_rate: 8.0e-5
  num_train_epochs: 1
  per_device_train_batch_size: 1
dataset:
  name: "banghua/DL-SFT-Dataset"
  max_samples: 1000

DPO Configuration (configs/dpo_config.yaml)

model:
  name: "HuggingFaceTB/SmolLM2-135M-Instruct"
training:
  beta: 0.2
  learning_rate: 5.0e-5
identity:
  positive_name: "Deep Qwen"
  organization_name: "Qwen"

RL Configuration (configs/rl_config.yaml)

model:
  name: "HuggingFaceTB/SmolLM2-135M-Instruct"
training:
  learning_rate: 5.0e-6
  num_generations: 4
dataset:
  name: "openai/gsm8k"

πŸ”§ API Usage

Python API Examples

from src.training.sft_trainer import SFTTrainingPipeline
from src.training.dpo_trainer import DPOTrainingPipeline
from src.training.rl_trainer import RLTrainingPipeline

# Supervised Fine-Tuning
sft_pipeline = SFTTrainingPipeline("HuggingFaceTB/SmolLM2-135M")
sft_pipeline.setup_training(dataset, learning_rate=8e-5)
sft_pipeline.train()

# Direct Preference Optimization
dpo_pipeline = DPOTrainingPipeline("HuggingFaceTB/SmolLM2-135M-Instruct")
dpo_dataset = dpo_pipeline.create_preference_dataset(raw_dataset)
dpo_pipeline.setup_training(dpo_dataset, beta=0.2)
dpo_pipeline.train()

# Online Reinforcement Learning
rl_pipeline = RLTrainingPipeline("HuggingFaceTB/SmolLM2-135M-Instruct")
rl_pipeline.setup_training(train_dataset, reward_function)
rl_pipeline.train()

πŸ“Š Evaluation and Benchmarking

Comprehensive Model Evaluation

python examples/run_benchmark.py \
    --model "path/to/your/model" \
    --math-samples 50 \
    --target-identity "Your Model Name" \
    --output-file "benchmark_results.json"

Available Metrics

  • Accuracy: Task-specific performance measurement
  • Identity Consistency: Model identity alignment
  • Safety Score: Harmful content detection
  • Perplexity: Language modeling quality
  • Math Reasoning: Mathematical problem-solving ability

πŸŽ“ Educational Value

This repository serves as both a practical implementation and an educational resource:

Learning Objectives

  • Understand the theory behind modern post-training techniques
  • Implement production-ready training pipelines
  • Evaluate model performance across multiple dimensions
  • Apply best practices in ML engineering and experimentation

Based on DeepLearning.AI Course

This implementation is based on and extends the DeepLearning.AI "Post-training LLMs" course, providing:

  • Enhanced code organization and modularity
  • Additional evaluation metrics and benchmarks
  • Production-ready implementations
  • Comprehensive documentation and examples

πŸ”¬ Research and Development

Supported Models

  • Small Models: SmolLM2-135M, SmolLM2-1.7B
  • Medium Models: Qwen2.5-0.5B, Qwen2.5-1.5B
  • Large Models: Any HuggingFace compatible model
  • Custom Models: Easy integration with custom architectures

Datasets

  • SFT: banghua/DL-SFT-Dataset, custom instruction datasets
  • DPO: mrfakename/identity, preference pair datasets
  • RL: openai/gsm8k, custom reward-based datasets

🀝 Contributing

We welcome contributions! Please see our contribution guidelines for details.

Development Setup

# Clone the repository
git clone https://github.com/YanCotta/post_training_llms.git
cd post_training_llms

# Create development environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt  # If available

# Run tests
python -m pytest tests/  # If tests are available

πŸ“ Citation

If you use this repository in your research or projects, please cite:

@misc{cotta2024posttrainingllms,
  title={Post-Training Techniques for Large Language Models},
  author={Yan Cotta},
  year={2024},
  url={https://github.com/YanCotta/post_training_llms},
  note={Based on DeepLearning.AI Post-training LLMs course}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • DeepLearning.AI for the foundational "Post-training LLMs" course
  • Hugging Face for the transformers library and model ecosystem
  • TRL Team for the training utilities and implementations
  • Open Source Community for the various datasets and tools used

πŸ“ž Support


⭐ Star this repository if you find it useful for your LLM post-training projects!

About

Different post-training techniques for LLMs, including: SFT, DPO and Online RL

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •