Skip to content

Note of caution in article on Subsampling for class imbalances #112

@giulianonetto

Description

@giulianonetto

Hello!

Thank you for the amazing work.

I noticed the Subsampling for class imbalances article recommends subsampling to address class imbalances. Please, I would like to suggest adding a warning with a note of caution about the potential deleterious effects of sub-/over-sampling methods. The most recent evidence points to severe harm in calibration and little to no benefit in discrimination. This recent literature does seem somewhat limited to binary classification, though. Happy to make a PR if agreed. Disclaimer: my personal bias comes from the clinical prediction world.

Recent references on the harms of class imbalance

Reference on the equivalence between oversampling and decision threshold selection

Assunção at al. (2024). Is Augmentation Effective in Improving Prediction in Imbalanced Datasets? https://doi.org/10.6339/24-jds1154

Text to be removed

In the Subsampling the data section, it is said that:

"However, subsampling almost always produces models that are better calibrated, meaning that the distributions of the class probabilities are more well behaved. As a result, the default 50% cutoff is much more likely to produce better sensitivity and specificity values than they would otherwise."

I am not sure how this could be the case. For instance, a logistic regression model with sub-/over-sampled data will be poorly calibrated, even with infinite data, due to incorrect intercept. Following Assunção at al. (2024)., setting the probability cutoff to the outcome prevalence seems to suffice, without harming calibration. Perhaps warning about the difference between a class imbalance problem and a sample size problem may be beneficial.

Thanks again,
Giuliano

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions