Add model calibration #235
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview: Probability Calibration via CalibratedClassifierCV
What it is
CalibratedClassifierCV wraps any classifier (RF/GBT/XGBoost, etc.) and learns a mapping from raw model scores to well-calibrated probabilities. It does this with an internal cross-validation loop: in each fold, it fits the base model on train-fold data and learns a calibration function on the fold’s held-out data. Two calibration methods are supported:
Why calibration matters
Tree models often output overconfident probabilities (lots of ~0.0 or ~1.0). Calibration fixes this so that predicted probabilities reflect reality (e.g., among samples with p≈0.7, ~70% are positive). Better calibration improves:
How we use it in JABS
Practical guidance
Trade-offs
Net impact for JABS
See Also
User Guide
I'm going to hold off on updating the user guide until I'm sure we're going to merge these changes
Settings Dialog Screenshots