If you use this code in your research, please cite our ICML 2025 paper:
@inproceedings{
zhang2025lemon,
title={{LEM}oN: Label Error Detection using Multimodal Neighbors},
author={Haoran Zhang and Aparna Balagopalan and Nassim Oufattole and Hyewon Jeong and Yan Wu and Jiacheng Zhu and Marzyeh Ghassemi},
booktitle={Forty-second International Conference on Machine Learning},
year={2025}
}
Run the following commands to clone this repo and create the Conda environment:
git clone [email protected]:MLforHealth/LEMoN.git
cd LEMoN
conda env create -f environment.yml
conda activate lemon
CIFAR-10 and CIFAR-100 are downloaded automatically by the codebase. To preprocess the remaining datasets, follow the instructions in DataSources.md.
To run a single evaluation, call run_lemon.py
with the appropriate arguments, for example:
python -m run_lemon \
--output_dir /output/dir \
--dataset mscoco \
--noise_type cat \
--noise_level 0.4
To reproduce the experiments in the paper which involve training a grid of models using different hyperparameters, use sweep.py
as follows:
python sweep.py launch \
--experiment {experiment_name} \
--output_dir {output_root} \
--command_launcher {launcher}
where:
experiment_name
corresponds to experiments defined as classes inexperiments.py
output_root
is a directory where experimental results will be stored.launcher
is a string corresponding to a launcher defined inlaunchers.py
(i.e.slurm
orlocal
).
After the lemon_all
experiment has finished running, to create Tables 2 and 3, run notebooks/agg_results.ipynb
and notebooks/hparam_drop.ipynb