D-GEN: Distractor Generation for LLM Evaluations

This repository contains resources for our paper "D-GEN: Automatically Generating Distractors for Reliable LLM Evaluation".

The paper is accepted to ACL 2025 Findings.

Overview

D-GEN is the first open-source Large Language Model (LLM) for distractor generation.

We evaluate the quality of distractors via Ranking Alignment Test and Entropy Analysis.

Ranking Alignment Test

To evaluate the effectiveness of our distractor generation model, we compare two test sets:

MMLU: The original MMLU. The original distractors are considered as gold distractors.
MMLU-DGEN: A modified MMLU, where the original distractors are replaced with new distractors generated by D-GEN.

Entropy Analysis

We compute the entropy of the predicted probability distribution over answer choices (A, B, C, and D).

Entropy quantifies the model's prediction uncertainty, allowing us to analyze how convincing the distractors are based on the model's confidence.

We observe no significant entropy differences between MMLU and MMLU-DGEN for most models, as shown in the Table above.

Model and Dataset

Models are released in two sizes: 8B and 70B.

They can be found on Hugging Face:

The MMLU-DGEN dataset is available at:

SungJoo/mmlu_dgen

Contact

Please contact Grace Byun ([email protected]) for any inquiries.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
entropy_analysis		entropy_analysis
model_train		model_train
ranking_alignment		ranking_alignment
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

D-GEN: Distractor Generation for LLM Evaluations

Overview

Model and Dataset

Contact

About

Uh oh!

Releases

Packages

Languages

emorynlp/D-Gen

Folders and files

Latest commit

History

Repository files navigation

D-GEN: Distractor Generation for LLM Evaluations

Overview

Model and Dataset

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages