Calibration Toolbox

A machine learning model calibration metrics library, built using NumPy. Evaluate model uncertainty using popular calibration metrics from deep learning research. Inspired by Uncertainty Toolbox.

Metrics

General Calibration Error (GCE)

Kull et al. (2019) - Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration
Nixon et al. (2020) - Measuring Calibration in Deep Learning

The (class conditional) general calibration error with $L^{p}$ norm can be expressed as:

$$\Large GCE = \big (\displaystyle\sum_{k=1}^{K} \text{ } \sum_{b=1}^{B} \frac{n_{bk}}{NK} |acc(b,k) - conf(b,k)|^p \big )^\frac{1}{p}$$

Where $acc(b,k)$ and $conf(b,k)$ are the accuracy and confidence of bin $b$ for class label $k$; $n_{bk}$ is the number of predicitions in bin $b$ for class label $k$; and $N$ is the total number of data points. This Python library allows users to customize this flexible calibration formula to fit their needs using the following code:

general_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = False,
                                top_k_classes = 1, norm = 1, thresholding = 0.00)

Additionally, we provide a simple interface to utilize commonly used metrics:

Expected Calibration Error (ECE)

Naeini et al. (2015) - Obtaining Well Calibrated Probabilities Using Bayesian Binning

#wrapper function
expected_calibration_error = ECE(prob, labels)

#general function
expected_calibration_error = GCE(prob, labels, bin = 15, class_conditional = False, adaptive_bins = False,
                                  top_k_classes = 1, norm = 1, thresholding = 0.00)

Root Mean Squared Calibration Error (RMSCE)

Hendrycks et al. (2019) - Deep Anomaly Detection with Outlier Exposure

#wrapper function
root_mean_square_calibration_error = RMSCE(prob, labels)

#general function
root_mean_square_calibration_error = GCE(prob, labels, bin = 15, class_conditional = False, adaptive_bins = False,
                                          top_k_classes = 1, norm = 2, thresholding = 0.00)

Maximum Calibration Error (MCE)

Naeini et al. (2015) - Obtaining Well Calibrated Probabilities Using Bayesian Binning

#wrapper function
maximum_calibration_error = MCE(prob, labels)

#general function
maximum_calibration_error = GCE(prob, labels, bin = 15, class_conditional = False, adaptive_bins = False,
                                top_k_classes = 1, norm = 'inf', thresholding = 0.00)

Adaptive Calibration Error (ACE)

Nixon et al. (2020) - Measuring Calibration in Deep Learning

#wrapper function
adaptive_calibration_error = ACE(prob, labels)

#general function
adaptive_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = True,
                                  top_k_classes = 'all', norm = 1, thresholding = 0.00)

Static Calibration Error (SCE)

Nixon et al. (2020) - Measuring Calibration in Deep Learning

#wrapper function
static_calibration_error = SCE(prob, labels)

#general function
static_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = False,
                                top_k_classes = 'all', norm = 1, thresholding = 0.00)

Top-r calibration error (ToprCE)

Gupta et al. (2021) - Calibration of Neural Networks using Splines

#wrapper function
top_r_calibration_error = ToprCE(prob, labels)

#general function
top_r_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = False,
                              top_k_classes = r, norm = 1, thresholding = 0.00)

Upcoming Features

Metrics

Class-wise calibration error (CWCE), Kumar et al. (2019) - Verified Uncertainty Calibration
Top-label calibration error (TCE), Kumar et al. (2019) - Verified Uncertainty Calibration
Kolmogorov-Smirnov calibration error (KSCE), Gupta et al. (2021) - Calibration of Neural Networks using Splines
Maximum mean calibration error (MMCE), Kumar et al. (2018) - Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings
Kernel calibration error (KCE), Kull et al. (2020) - Calibration tests in multi-class classification: A unifying framework
Over/Under confidence decomposition, Pearce et al. (2022) - Adaptive Confidence Calibration

Visualizations

Reliability Diagram
Confidence Histograms, Guo et al. (2017) - On Calibration of Modern Neural Networks

Other

PyTorch recalibration library

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
models		models
plots		plots
pretrained_models		pretrained_models
README.md		README.md
demo.py		demo.py
metrics.py		metrics.py
papers.md		papers.md
recalibration.py		recalibration.py
test.py		test.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Calibration Toolbox

Metrics

General Calibration Error (GCE)

Expected Calibration Error (ECE)

Root Mean Squared Calibration Error (RMSCE)

Maximum Calibration Error (MCE)

Adaptive Calibration Error (ACE)

Static Calibration Error (SCE)

Top-r calibration error (ToprCE)

Upcoming Features

Metrics

Visualizations

Other

About

Releases

Packages

Languages

Jonathan-Pearce/calibration-toolbox

Folders and files

Latest commit

History

Repository files navigation

Calibration Toolbox

Metrics

General Calibration Error (GCE)

Expected Calibration Error (ECE)

Root Mean Squared Calibration Error (RMSCE)

Maximum Calibration Error (MCE)

Adaptive Calibration Error (ACE)

Static Calibration Error (SCE)

Top-r calibration error (ToprCE)

Upcoming Features

Metrics

Visualizations

Other

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages