If you use this code in your research, please cite our paper:
@misc{kachwala2025prefilledresponsesenhancezeroshot,
title={Prefilled responses enhance zero-shot detection of AI-generated images},
author={Zoher Kachwala and Danishjeet Singh and Danielle Yang and Filippo Menczer},
year={2025},
eprint={2506.11031},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.11031},
}Note: Paper submitted to ACL ARR.
Can you tell which images above are real vs AI-generated? Answer in footnote¹
This repository contains the evaluation system for our paper on using Prefill-Guided Thinking (PGT) to detect AI-generated images with Vision-Language Models (VLMs).
💡 For detailed technical documentation, particularly helpful for LLM code agents: See AGENTS.md for complete architecture details, function signatures, and implementation specifics.
Key Finding: Simply prefilling a VLM's response with the phrase "Let's examine the style and the synthesis artifacts" improves detection by up to 24% in Macro F1 — without any training or fine-tuning.
Instead of asking a VLM to detect fake images directly, we prefill its response to guide its reasoning:
- (a) Baseline: No guidance → incorrect classification
- (b) Chain-of-Thought: Generic reasoning phrase → still incorrect
- (c) S2 (our method): Task-aligned phrase → correct classification ✓
The magic phrase: "Let's examine the style and the synthesis artifacts"
This simple technique works across 3 VLMs and 16 different image generators spanning faces, objects, and natural scenes.
See SETUP.md for complete environment setup instructions (conda, PyTorch, vLLM, Flash-Attention).
See Usage Examples for detailed command-line examples and all available options.
We evaluate on three diverse benchmarks:
| Dataset | Content | Images | Generators |
|---|---|---|---|
| D3 | Diverse web images (objects, scenes, art) | 8.4k | 4 (Stable Diffusion variants, DeepFloyd) |
| DF40 | Human faces (deepfakes) | 10k | 6 (Midjourney, StyleCLIP, StarGAN, etc.) |
| GenImage | ImageNet objects (animals, vehicles) | 10k | 8 (ADM, BigGAN, GLIDE, etc.) |
See Data Collection & Setup for complete instructions on downloading and organizing all three datasets.
- Qwen2.5-VL-7B - Native dynamic-resolution ViT
- LLaVA-OneVision-7B - GPT-trained multimodal model
- Llama-3.2-Vision-11B - Vision adapter + Llama 3.1 LM
All models use instruction-tuned variants via vLLM for efficient inference.
| Method | Description |
|---|---|
| Baseline | No prefill, just ask the question |
| CoT | Chain-of-thought reasoning |
| S2 | Task-aligned (our method) |
See Usage Examples for detailed command-line examples and all available options.
Detection performance (Macro F1) across models, datasets, and PGT variations. Bars are annotated with relative improvements of S2 over the next best method and 95% confidence error bars from 10k bootstrap iterations.
Detection performance (Recall) for Llama across different datasets and their state-of-the-art synthetic image generators. Similar figures for LLaVA and Qwen in the paper.
- Multi-Response Generation (n>1) - Generate multiple responses with majority voting → Details
- Phrase Modes - Test prefill vs prompt vs system instruction → Details
- Debug Mode - Quick validation with 5 examples → Details
Results are saved in hierarchical directories with timestamped JSON files containing metrics and full reasoning traces.
See Output Structure for detailed file organization and JSON schemas.
Generate publication-ready plots (Macro F1 bars, radar plots, vocabulary analysis, etc.)
See Plotting & Visualization System for available plots and usage instructions.
- SETUP.md - Environment setup and installation instructions
- AGENTS.md - Complete technical reference (architecture, function signatures, all details)
- Paper - arXiv:2506.11031
Zoher Kachwala · Danishjeet Singh · Danielle Yang · Filippo Menczer
Observatory on Social Media Indiana University, Bloomington
¹ Answer to image quiz: Only images 3, 10, and 11 in the mosaic are real. All others are AI-generated.



