Skip to content

emorynlp/D-Gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

D-GEN: Distractor Generation for LLM Evaluations

This repository contains resources for our paper "D-GEN: Automatically Generating Distractors for Reliable LLM Evaluation".

The paper is accepted to ACL 2025 Findings.

Overview

D-GEN is the first open-source Large Language Model (LLM) for distractor generation.

We evaluate the quality of distractors via Ranking Alignment Test and Entropy Analysis.

  1. Ranking Alignment Test

To evaluate the effectiveness of our distractor generation model, we compare two test sets:

  • MMLU: The original MMLU. The original distractors are considered as gold distractors.

  • MMLU-DGEN: A modified MMLU, where the original distractors are replaced with new distractors generated by D-GEN.

스크린샷 2025-04-16 오후 3 27 25 plot_rank3
  1. Entropy Analysis

We compute the entropy of the predicted probability distribution over answer choices (A, B, C, and D).

Entropy quantifies the model's prediction uncertainty, allowing us to analyze how convincing the distractors are based on the model's confidence.

스크린샷 2025-04-16 오후 3 31 01

We observe no significant entropy differences between MMLU and MMLU-DGEN for most models, as shown in the Table above.

Model and Dataset

Models are released in two sizes: 8B and 70B.

They can be found on Hugging Face:

The MMLU-DGEN dataset is available at:

Contact

Please contact Grace Byun ([email protected]) for any inquiries.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published