Skip to content

AI-powered model auditing agent with multi-agent debate for robust evaluation of machine learning models.

Notifications You must be signed in to change notification settings

MLO-lab/ModelAuditor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelAuditor Agent

AI-powered model auditing agent with multi-agent debate for robust evaluation of machine learning models.

Setup

Using uv (recommended)

uv sync
uv run python main.py --model resnet50 --dataset CIFAR10 --weights path/to/weights.pth

Using pip

pip install -e .
python main.py --model resnet50 --dataset CIFAR10 --weights path/to/weights.pth

Medical AI dependencies (optional)

uv sync --extra medical  # or pip install -e ".[medical]"

Usage

Basic Usage

python main.py --model resnet50 --dataset CIFAR10 --weights models/model.pth

Medical Models

# ISIC skin lesion classification
python main.py --model siim-isic --dataset isic --weights models/isic/model.pth

# HAM10000 dataset
python main.py --model deepderm --dataset ham10000 --weights models/ham10000.pth

Options

  • --subset N: Use N samples for faster evaluation
  • --no-debate: Disable multi-agent debate
  • --single-agent: Use single agent instead of multi-agent debate
  • --device: Specify device (cpu, cuda, mps)

Environment Variables

Set your API keys:

export ANTHROPIC_API_KEY="your-key"
export OPENAI_API_KEY="your-key"  # if using non-Anthropic models

Project Structure

  • main.py - Interactive model auditor with multi-agent debate
  • testbench.py - Automated evaluation script
  • utils/agent.py - Multi-agent conversation system
  • architectures/ - Custom model architectures
  • prompts/ - System prompts for different evaluation phases
  • models/ - Pre-trained model weights
  • results/ - Evaluation results and conversation logs

About

AI-powered model auditing agent with multi-agent debate for robust evaluation of machine learning models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages