Skip to content

build-ai-applications/Eval-small-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

20 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ Model Evaluation with SGLang

This repository contains tools for setting up SGLang with CUDA support and evaluating various language models using a standardized benchmarking approach.

Model Scores Graph

๐Ÿ“‚ Repository Structure

.
โ”œโ”€โ”€ ๐Ÿ“œ README.md  
โ”œโ”€โ”€ ๐Ÿ““ Sglang_instruct_eval.ipynb    # Main benchmarking notebook  
โ”œโ”€โ”€ ๐Ÿ“ Sample Generation/            # Contains model evaluation scores  
โ””โ”€โ”€ ๐Ÿ“„ instruct Benchmark.txt        # Detailed benchmark outputs  

โœ… Prerequisites

  • ๐ŸŽฎ NVIDIA GPU with CUDA support
  • ๐Ÿ Python 3.10 or later
  • ๐Ÿ–ฅ๏ธ Linux operating system
  • ๐Ÿ’พ Sufficient disk space (~5GB for CUDA, ~2GB for PyTorch, ~1GB for SGLang)

โš™๏ธ Setup Instructions

๐Ÿ”น 1. CUDA Installation

# Download CUDA 12.1  
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run

# Install CUDA (without driver)  
sudo sh cuda_12.1.0_530.30.02_linux.run

# Add CUDA to PATH and LD_LIBRARY_PATH  
export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Make the PATH changes permanent  
echo 'export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc

๐Ÿ”น 2. PyTorch Installation

# Install PyTorch with CUDA 12.1 support  
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

๐Ÿ”น 3. SGLang Installation

# Install SGLang with all dependencies  
pip install "sglang[all]"

# Install FlashInfer (optional, for better performance)  
pip install flashinfer -i https://flashinfer.ai/whl/cu124/torch2.4/

๐Ÿ“Š Model Benchmarking

๐Ÿ“Œ Overview

The repository includes a comprehensive benchmarking system that evaluates language models across 20 diverse questions. The evaluation process includes:
โœ… Automated model response generation
โœ… Response quality assessment
โœ… Score aggregation and analysis

โ–ถ๏ธ Running the Benchmark

1๏ธโƒฃ Open Sglang_instruct_eval.ipynb in Jupyter Notebook/Lab
2๏ธโƒฃ Follow the notebook cells to:

  • ๐Ÿ”„ Load and initialize models
  • โœ๏ธ Generate responses to benchmark questions
  • ๐Ÿ’พ Save results in JSON format

๐Ÿ† Evaluation Methods

The framework supports two evaluation approaches:

1๏ธโƒฃ Automated Evaluation ๐Ÿค–

  • Uses larger models (ChatGPT/Claude) as evaluators
  • Systematic scoring based on predefined criteria
  • Evaluation Prompt:
    Task: You are an expert evaluator responsible for scoring responses from a smaller LLM across various knowledge domains. Given a question and the LLM's response, assess the response based on accuracy, coherence, depth, and clarity. Provide a score (out of 100) for each category listed below. Evaluation Criteria: Creative Writing: Assess narrative coherence, expressiveness, and poetic strength. Science & Technology: Judge accuracy, structure, and ease of understanding.Social Sciences & Humanities: Evaluate insightfulness and depth of analysis. Business & Economics: Check for relevance, practical insights, and clarity. Health & Well-being: Assess informativeness, clarity, and explanation depth. Environment & Climate: Evaluate comprehensiveness, structure, and impact assessment.

2๏ธโƒฃ **Manual Evaluation** ๐Ÿ‘จโ€๐Ÿ’ป  
- Direct **human assessment** of responses  
- Qualitative and quantitative scoring  

### ๐Ÿ“‚ Output Format  
Results are stored in **two locations**:  
- ๐Ÿ“ `Sample Generation/` โ†’ Contains individual model scores  
- ๐Ÿ“„ `instruct Benchmark.txt` โ†’ Detailed benchmark outputs  

## โœ… Verification Steps  
```bash
# Verify CUDA installation  
nvcc --version  

# Verify PyTorch CUDA support  
python3 -c "import torch; print('CUDA available:', torch.cuda.is_available())"

# Test SGLang  
python3 -c "import sglang as sgl"

๐Ÿš€ Quick Start Example

import sglang as sgl
import asyncio

# Initialize the engine with a model  
llm = sgl.Engine(model_path="NousResearch/Hermes-3-Llama-3.2-3B")

โš ๏ธ Important Notes

โšก Make sure you have the appropriate NVIDIA drivers installed before CUDA installation
โšก The CUDA installer will warn about driver installation โ€“ you can ignore this if you already have compatible drivers
โšก If you encounter permission issues, you might need to use sudo for some commands
โšก For production use, consider using a virtual environment
โšก Benchmark results may vary based on model versions and evaluation criteria

๐Ÿ› ๏ธ Troubleshooting

If you encounter any issues:
๐Ÿ”น Verify your NVIDIA drivers are properly installed
๐Ÿ”น Check if CUDA paths are correctly set in your environment
๐Ÿ”น Ensure your Python version is compatible
๐Ÿ”น Make sure you have sufficient disk space
๐Ÿ”น Check JSON output format if evaluation results arenโ€™t being saved properly

๐Ÿค Contributing

Feel free to contribute by:
๐Ÿ”น Adding new benchmark questions
๐Ÿ”น Implementing additional evaluation metrics
๐Ÿ”น Testing with different models
๐Ÿ”น Improving documentation

๐Ÿ“š References

๐Ÿ”— NVIDIA CUDA Toolkit
๐Ÿ”— PyTorch Installation
๐Ÿ”— SGLang Documentation
๐Ÿ”— FlashInfer

๐Ÿ“œ License

๐Ÿ“„ This setup guide and benchmarking framework is provided under the Apache 2.0.

About

๐Ÿš€ Model Evaluation with SGLang

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •