A comprehensive implementation and educational resource for modern post-training techniques that enhance Large Language Model (LLM) capabilities and alignment.
This repository provides production-ready implementations of three key post-training techniques:
- π Supervised Fine-Tuning (SFT): Enhance instruction-following capabilities
- βοΈ Direct Preference Optimization (DPO): Align models with human preferences
- π Online Reinforcement Learning (GRPO): Improve task-specific performance with reward signals
All implementations are based on the DeepLearning.AI "Post-training LLMs" course, enhanced with professional software engineering practices, comprehensive documentation, and extensible architecture.
- ποΈ Modular Architecture: Clean, extensible codebase with clear separation of concerns
- π Educational Notebooks: Step-by-step tutorials with detailed explanations
- β‘ Production Ready: Professional implementations suitable for real-world applications
- π§ Easy Configuration: YAML-based configuration for all training parameters
- π Comprehensive Evaluation: Built-in metrics and benchmarking tools
- π Multiple Interfaces: Command-line scripts, Python API, and Jupyter notebooks
- ποΈ Flexible Models: Support for various model architectures and sizes
post_training_llms/
βββ src/ # Core implementation
β βββ utils/ # Utility functions
β β βββ model_utils.py # Model loading, generation, evaluation
β β βββ data_utils.py # Dataset preparation and processing
β β βββ config.py # Unified configuration system
β β βββ config_manager.py # Configuration management utilities
β βββ training/ # Training pipelines
β β βββ sft_trainer.py # Supervised Fine-Tuning
β β βββ dpo_trainer.py # Direct Preference Optimization
β β βββ rl_trainer.py # Online RL with GRPO
β βββ evaluation/ # Evaluation and metrics
β βββ metrics.py # Performance metrics
β βββ benchmark.py # Comprehensive benchmarking
βββ notebooks/ # Educational tutorials
β βββ 01_supervised_fine_tuning.ipynb
β βββ 02_direct_preference_optimization.ipynb
β βββ 03_online_reinforcement_learning.ipynb
βββ examples/ # Example scripts
β βββ run_sft.py # SFT training example
β βββ run_dpo.py # DPO training example
β βββ run_rl.py # RL training example
β βββ run_benchmark.py # Model evaluation
β βββ config_utils.py # Configuration utilities
βββ configs/ # Configuration files
β βββ sft_config.yaml # SFT parameters
β βββ dpo_config.yaml # DPO parameters
β βββ rl_config.yaml # RL parameters
βββ data/ # Data storage (created at runtime)
βββ models/ # Model storage (created at runtime)
The unified configuration system provides a robust, type-safe way to manage all training parameters:
BaseConfig: Abstract base class with common configuration fieldsSFTConfig: Configuration for Supervised Fine-TuningDPOConfig: Configuration for Direct Preference OptimizationRLConfig: Configuration for Reinforcement LearningConfigManager: Utility class for configuration operations
- π Type Safety: All configurations use Python dataclasses with validation
- β Data Validation: Automatic validation of parameter types and ranges
- π Inheritance: Method-specific configs inherit from base configuration
- π YAML Support: Load/save configurations in human-readable YAML format
- ποΈ Command Overrides: Command-line arguments can override config values
- π§ Utility Functions: Built-in tools for validation, merging, and conversion
# Example configuration hierarchy
BaseConfig
βββ ModelConfig # Model settings (name, trust_remote_code)
βββ TrainingConfig # Common training parameters
β βββ SFTTrainingConfig
β βββ DPOTrainingConfig (with beta parameter)
β βββ RLTrainingConfig (with num_generations)
βββ DatasetConfig # Dataset settings
βββ HardwareConfig # Hardware settings (GPU, mixed precision)
βββ OutputConfig # Output settings
βββ EvaluationConfig # Evaluation settings-
Clone the repository:
git clone https://github.com/YanCotta/post_training_llms.git cd post_training_llms -
Install dependencies:
pip install -r requirements.txt
-
Verify installation:
python -c "import torch; import transformers; import datasets; import trl; print('β All dependencies installed successfully!')"
The project now features a unified configuration system that eliminates code duplication and ensures consistency across all training methods.
All training scripts now support configuration files with command-line overrides:
# Use configuration file with overrides
python examples/run_sft.py \
--config configs/sft_config.yaml \
--learning-rate 1e-4 \
--epochs 2Use the configuration utility script for common operations:
# Create a new configuration template
python examples/config_utils.py create --type sft --output configs/my_config.yaml
# Validate a configuration file
python examples/config_utils.py validate --config configs/sft_config.yaml
# List all available configurations
python examples/config_utils.py list --directory configs
# Convert configuration to training arguments
python examples/config_utils.py convert --config configs/sft_config.yamlThe configuration system includes comprehensive testing:
# Test all configuration functionality
python -c "
from src.utils.config import create_default_config
from src.utils.config_manager import ConfigManager
# Create and validate configurations
sft_config = create_default_config('sft')
is_valid = ConfigManager.validate_config(sft_config)
print(f'Configuration system working: {is_valid}')
"python examples/run_sft.py \
--config configs/sft_config.yaml \
--max-samples 100python examples/run_dpo.py \
--config configs/dpo_config.yaml \
--new-identity "My Assistant" \
--max-samples 50python examples/run_rl.py \
--model "HuggingFaceTB/SmolLM2-135M-Instruct" \
--dataset "openai/gsm8k" \
--max-train-samples 20 \
--max-eval-samples 10 \
--output-dir "./models/my_rl_model"Explore the techniques through our comprehensive tutorial notebooks:
-
Supervised Fine-Tuning Tutorial
- Learn how SFT improves instruction-following
- Hands-on training with real datasets
- Performance evaluation and analysis
-
Direct Preference Optimization Tutorial
- Understand preference-based training
- Identity modification example
- Consistency measurement and evaluation
-
Online Reinforcement Learning Tutorial
- Reward-based model improvement
- Mathematical reasoning enhancement
- GRPO training and evaluation
jupyter notebook notebooks/All training parameters can be customized using YAML configuration files:
model:
name: "HuggingFaceTB/SmolLM2-135M"
training:
learning_rate: 8.0e-5
num_train_epochs: 1
per_device_train_batch_size: 1
dataset:
name: "banghua/DL-SFT-Dataset"
max_samples: 1000model:
name: "HuggingFaceTB/SmolLM2-135M-Instruct"
training:
beta: 0.2
learning_rate: 5.0e-5
identity:
positive_name: "Deep Qwen"
organization_name: "Qwen"model:
name: "HuggingFaceTB/SmolLM2-135M-Instruct"
training:
learning_rate: 5.0e-6
num_generations: 4
dataset:
name: "openai/gsm8k"from src.training.sft_trainer import SFTTrainingPipeline
from src.training.dpo_trainer import DPOTrainingPipeline
from src.training.rl_trainer import RLTrainingPipeline
# Supervised Fine-Tuning
sft_pipeline = SFTTrainingPipeline("HuggingFaceTB/SmolLM2-135M")
sft_pipeline.setup_training(dataset, learning_rate=8e-5)
sft_pipeline.train()
# Direct Preference Optimization
dpo_pipeline = DPOTrainingPipeline("HuggingFaceTB/SmolLM2-135M-Instruct")
dpo_dataset = dpo_pipeline.create_preference_dataset(raw_dataset)
dpo_pipeline.setup_training(dpo_dataset, beta=0.2)
dpo_pipeline.train()
# Online Reinforcement Learning
rl_pipeline = RLTrainingPipeline("HuggingFaceTB/SmolLM2-135M-Instruct")
rl_pipeline.setup_training(train_dataset, reward_function)
rl_pipeline.train()python examples/run_benchmark.py \
--model "path/to/your/model" \
--math-samples 50 \
--target-identity "Your Model Name" \
--output-file "benchmark_results.json"- Accuracy: Task-specific performance measurement
- Identity Consistency: Model identity alignment
- Safety Score: Harmful content detection
- Perplexity: Language modeling quality
- Math Reasoning: Mathematical problem-solving ability
This repository serves as both a practical implementation and an educational resource:
- Understand the theory behind modern post-training techniques
- Implement production-ready training pipelines
- Evaluate model performance across multiple dimensions
- Apply best practices in ML engineering and experimentation
This implementation is based on and extends the DeepLearning.AI "Post-training LLMs" course, providing:
- Enhanced code organization and modularity
- Additional evaluation metrics and benchmarks
- Production-ready implementations
- Comprehensive documentation and examples
- Small Models: SmolLM2-135M, SmolLM2-1.7B
- Medium Models: Qwen2.5-0.5B, Qwen2.5-1.5B
- Large Models: Any HuggingFace compatible model
- Custom Models: Easy integration with custom architectures
- SFT: banghua/DL-SFT-Dataset, custom instruction datasets
- DPO: mrfakename/identity, preference pair datasets
- RL: openai/gsm8k, custom reward-based datasets
We welcome contributions! Please see our contribution guidelines for details.
# Clone the repository
git clone https://github.com/YanCotta/post_training_llms.git
cd post_training_llms
# Create development environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt # If available
# Run tests
python -m pytest tests/ # If tests are availableIf you use this repository in your research or projects, please cite:
@misc{cotta2024posttrainingllms,
title={Post-Training Techniques for Large Language Models},
author={Yan Cotta},
year={2024},
url={https://github.com/YanCotta/post_training_llms},
note={Based on DeepLearning.AI Post-training LLMs course}
}This project is licensed under the MIT License - see the LICENSE file for details.
- DeepLearning.AI for the foundational "Post-training LLMs" course
- Hugging Face for the transformers library and model ecosystem
- TRL Team for the training utilities and implementations
- Open Source Community for the various datasets and tools used
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: [email protected]
β Star this repository if you find it useful for your LLM post-training projects!