Prefilled responses enhance zero-shot detection of AI-generated images

🤝 Citation

If you use this code in your research, please cite our paper:

@misc{kachwala2025prefilledresponsesenhancezeroshot,
      title={Prefilled responses enhance zero-shot detection of AI-generated images}, 
      author={Zoher Kachwala and Danishjeet Singh and Danielle Yang and Filippo Menczer},
      year={2025},
      eprint={2506.11031},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.11031}, 
}

Note: Paper submitted to ACL ARR.

Can you tell which images above are real vs AI-generated? Answer in footnote¹

This repository contains the evaluation system for our paper on using Prefill-Guided Thinking (PGT) to detect AI-generated images with Vision-Language Models (VLMs).

💡 For detailed technical documentation, particularly helpful for LLM code agents: See AGENTS.md for complete architecture details, function signatures, and implementation specifics.

Key Finding: Simply prefilling a VLM's response with the phrase "Let's examine the style and the synthesis artifacts" improves detection by up to 24% in Macro F1 — without any training or fine-tuning.

🎯 What is Prefill-Guided Thinking?

Instead of asking a VLM to detect fake images directly, we prefill its response to guide its reasoning:

(a) Baseline: No guidance → incorrect classification
(b) Chain-of-Thought: Generic reasoning phrase → still incorrect
(c) S2 (our method): Task-aligned phrase → correct classification ✓

The magic phrase: "Let's examine the style and the synthesis artifacts"

This simple technique works across 3 VLMs and 16 different image generators spanning faces, objects, and natural scenes.

🚀 Quick Start

Installation

See SETUP.md for complete environment setup instructions (conda, PyTorch, vLLM, Flash-Attention).

Usage

See Usage Examples for detailed command-line examples and all available options.

📊 Datasets

We evaluate on three diverse benchmarks:

Dataset	Content	Images	Generators
D3	Diverse web images (objects, scenes, art)	8.4k	4 (Stable Diffusion variants, DeepFloyd)
DF40	Human faces (deepfakes)	10k	6 (Midjourney, StyleCLIP, StarGAN, etc.)
GenImage	ImageNet objects (animals, vehicles)	10k	8 (ADM, BigGAN, GLIDE, etc.)

Setup Data

See Data Collection & Setup for complete instructions on downloading and organizing all three datasets.

🧪 Supported Models

Qwen2.5-VL-7B - Native dynamic-resolution ViT
LLaVA-OneVision-7B - GPT-trained multimodal model
Llama-3.2-Vision-11B - Vision adapter + Llama 3.1 LM

All models use instruction-tuned variants via vLLM for efficient inference.

🎨 Three Evaluation Methods

Method	Description
Baseline	No prefill, just ask the question
CoT	Chain-of-thought reasoning
S2	Task-aligned (our method)

See Usage Examples for detailed command-line examples and all available options.

📈 Results

Detection performance (Macro F1) across models, datasets, and PGT variations. Bars are annotated with relative improvements of S2 over the next best method and 95% confidence error bars from 10k bootstrap iterations.

Detection performance (Recall) for Llama across different datasets and their state-of-the-art synthetic image generators. Similar figures for LLaVA and Qwen in the paper.

🔬 Advanced Usage

Multi-Response Generation (n>1) - Generate multiple responses with majority voting → Details
Phrase Modes - Test prefill vs prompt vs system instruction → Details
Debug Mode - Quick validation with 5 examples → Details

📂 Output Structure

Results are saved in hierarchical directories with timestamped JSON files containing metrics and full reasoning traces.

See Output Structure for detailed file organization and JSON schemas.

📊 Visualization & Analysis

Generate publication-ready plots (Macro F1 bars, radar plots, vocabulary analysis, etc.)

See Plotting & Visualization System for available plots and usage instructions.

📚 Documentation

SETUP.md - Environment setup and installation instructions
AGENTS.md - Complete technical reference (architecture, function signatures, all details)
Paper - arXiv:2506.11031

👥 Authors

Zoher Kachwala · Danishjeet Singh · Danielle Yang · Filippo Menczer

Observatory on Social Media Indiana University, Bloomington

_{¹ Answer to image quiz: Only images 3, 10, and 11 in the mosaic are real. All others are AI-generated.}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
images		images
results		results
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE.md		LICENSE.md
README.md		README.md
SETUP.md		SETUP.md
config.py		config.py
evaluate.py		evaluate.py
helpers.py		helpers.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prefilled responses enhance zero-shot detection of AI-generated images

🤝 Citation

🎯 What is Prefill-Guided Thinking?

🚀 Quick Start

Installation

Usage

📊 Datasets

Setup Data

🧪 Supported Models

🎨 Three Evaluation Methods

📈 Results

🔬 Advanced Usage

📂 Output Structure

📊 Visualization & Analysis

📚 Documentation

👥 Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

Zoher15/Zero-shot-PGT

Folders and files

Latest commit

History

Repository files navigation

Prefilled responses enhance zero-shot detection of AI-generated images

🤝 Citation

🎯 What is Prefill-Guided Thinking?

🚀 Quick Start

Installation

Usage

📊 Datasets

Setup Data

🧪 Supported Models

🎨 Three Evaluation Methods

📈 Results

🔬 Advanced Usage

📂 Output Structure

📊 Visualization & Analysis

📚 Documentation

👥 Authors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages