LEMoN: Label Error Detection using Multimodal Neighbors

Paper

If you use this code in your research, please cite our ICML 2025 paper:

@inproceedings{
    zhang2025lemon,
    title={{LEM}oN: Label Error Detection using Multimodal Neighbors},
    author={Haoran Zhang and Aparna Balagopalan and Nassim Oufattole and Hyewon Jeong and Yan Wu and Jiacheng Zhu and Marzyeh Ghassemi},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025}
}

To replicate the experiments in the paper:

Step 0: Environment and Prerequisites

Run the following commands to clone this repo and create the Conda environment:

git clone [email protected]:MLforHealth/LEMoN.git
cd LEMoN
conda env create -f environment.yml
conda activate lemon

Step 1: Preprocessing Data

CIFAR-10 and CIFAR-100 are downloaded automatically by the codebase. To preprocess the remaining datasets, follow the instructions in DataSources.md.

Step 2: Running Experiments

To run a single evaluation, call run_lemon.py with the appropriate arguments, for example:

python -m run_lemon \ 
    --output_dir /output/dir \
    --dataset mscoco \
    --noise_type cat \
    --noise_level 0.4

To reproduce the experiments in the paper which involve training a grid of models using different hyperparameters, use sweep.py as follows:

python sweep.py launch \
    --experiment {experiment_name} \
    --output_dir {output_root} \
    --command_launcher {launcher}

where:

experiment_name corresponds to experiments defined as classes in experiments.py
output_root is a directory where experimental results will be stored.
launcher is a string corresponding to a launcher defined in launchers.py (i.e. slurm or local).

Step 3: Aggregating Results

After the lemon_all experiment has finished running, to create Tables 2 and 3, run notebooks/agg_results.ipynb and notebooks/hparam_drop.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
lib		lib
notebooks		notebooks
.gitignore		.gitignore
DataSources.md		DataSources.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
experiments.py		experiments.py
launchers.py		launchers.py
requirements.txt		requirements.txt
run_lemon.py		run_lemon.py
sweep.py		sweep.py
train_clip_from_scratch.py		train_clip_from_scratch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LEMoN: Label Error Detection using Multimodal Neighbors

Paper

To replicate the experiments in the paper:

Step 0: Environment and Prerequisites

Step 1: Preprocessing Data

Step 2: Running Experiments

Step 3: Aggregating Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MLforHealth/LEMoN

Folders and files

Latest commit

History

Repository files navigation

LEMoN: Label Error Detection using Multimodal Neighbors

Paper

To replicate the experiments in the paper:

Step 0: Environment and Prerequisites

Step 1: Preprocessing Data

Step 2: Running Experiments

Step 3: Aggregating Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages