Skip to content

RylanSchaeffer/KoyejoLab-Min-p-Sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Min-P, Max Exaggeration

A Critical Analysis of Min-p Sampling in Language Models

This repository contains code and figures for our Arxiv preprint Min-p, Max Exaggeration: A Critical Analysis of Min-p Sampling in Language Models. We investigate Nguyen et al. (2025)'s Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs and find what we believe are substantive flaws across all four lines of evidence: human evals, NLP benchmark evals, LLM-as-a-Judge evals and community adoption. We conclude that evidence presented in the original paper fails to support claims that min-p improves quality, diversity, or a trade-off between quality and diversity.

arXiv

Explanation | Installation | Usage | Citation | Contact

Explanation

Installation

  1. (Optional) Update conda:

conda update -n base -c defaults conda -y

  1. Create and activate the conda environment:

conda create -n min_p_env python=3.11 -y && conda activate min_p_env

  1. Install the required packages:

pip install vllm lm_eval wandb pandas seaborn nvidia-htop statsmodels

or exactly install the versions we used:

conda env create -f environment.yml

  1. (If running NLP benchmark evaluations) Sign into wandb with wandb login and into HuggingFace with huggingface-cli login

Usage

Human Evaluations

NLP Benchmark Evaluations

To evaluate a single model (to sanity check that the code runs), run:

export PYTHONPATH=. && export CUDA_VISIBLE_DEVICES=<YOUR GPU NUMBER> && conda activate min_p_env && python -u scripts/run_one_eval.py

To run the full evaluation, create a W&B sweep:

wandb sweep PATH TO SWEEP YAML CONFIG

And then launch an agent per GPU:

export PYTHONPATH=. && export CUDA_VISIBLE_DEVICES=<YOUR GPU NUMBER> && conda activate min_p_env && wandb agent ...

Our W&B sweeps are publicly available.

Note: When adding GPQA, Hendrycks MATH and MMLU Pro, we found out that Gemma 2 2B IT and Gemma 2 9B IT have a templating error in lm_eval version 0.4.7 when using vllm. We are currently not going to update our version of lm_eval to avoid introducing a potential confounder in our sweeps.

Citation

To cite this work, please use:

@misc{schaeffer2025turningheatcriticalanalysis,
      title={Min-p, Max Exaggeration: A Critical Analysis of Min-p Sampling in Language Models}, 
      author={Rylan Schaeffer and Joshua Kazdan and Yegor Denisov-Blanch},
      year={2025},
      eprint={2506.13681},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.13681}, 
}

Contact

Questions? Comments? Interested in collaborating? Open an issue or email [email protected] or any of the other authors.

About

Code for Arxiv 2025 Min-P, Max Exaggeration: A Critical Analysis of Min-p Sampling in Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages