Skip to content

Jonathan-Pearce/calibration-toolbox

Repository files navigation

Calibration Toolbox

A machine learning model calibration metrics library, built using NumPy. Evaluate model uncertainty using popular calibration metrics from deep learning research. Inspired by Uncertainty Toolbox.

Metrics

General Calibration Error (GCE)

The (class conditional) general calibration error with $L^{p}$ norm can be expressed as:

$$\Large GCE = \big (\displaystyle\sum_{k=1}^{K} \text{ } \sum_{b=1}^{B} \frac{n_{bk}}{NK} |acc(b,k) - conf(b,k)|^p \big )^\frac{1}{p}$$

Where $acc(b,k)$ and $conf(b,k)$ are the accuracy and confidence of bin $b$ for class label $k$; $n_{bk}$ is the number of predicitions in bin $b$ for class label $k$; and $N$ is the total number of data points. This Python library allows users to customize this flexible calibration formula to fit their needs using the following code:

general_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = False,
                                top_k_classes = 1, norm = 1, thresholding = 0.00)

Additionally, we provide a simple interface to utilize commonly used metrics:

Expected Calibration Error (ECE)

#wrapper function
expected_calibration_error = ECE(prob, labels)

#general function
expected_calibration_error = GCE(prob, labels, bin = 15, class_conditional = False, adaptive_bins = False,
                                  top_k_classes = 1, norm = 1, thresholding = 0.00) 

Root Mean Squared Calibration Error (RMSCE)

#wrapper function
root_mean_square_calibration_error = RMSCE(prob, labels)

#general function
root_mean_square_calibration_error = GCE(prob, labels, bin = 15, class_conditional = False, adaptive_bins = False,
                                          top_k_classes = 1, norm = 2, thresholding = 0.00) 

Maximum Calibration Error (MCE)

#wrapper function
maximum_calibration_error = MCE(prob, labels)

#general function
maximum_calibration_error = GCE(prob, labels, bin = 15, class_conditional = False, adaptive_bins = False,
                                top_k_classes = 1, norm = 'inf', thresholding = 0.00) 

Adaptive Calibration Error (ACE)

#wrapper function
adaptive_calibration_error = ACE(prob, labels)

#general function
adaptive_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = True,
                                  top_k_classes = 'all', norm = 1, thresholding = 0.00) 

Static Calibration Error (SCE)

#wrapper function
static_calibration_error = SCE(prob, labels)

#general function
static_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = False,
                                top_k_classes = 'all', norm = 1, thresholding = 0.00) 

Top-r calibration error (ToprCE)

#wrapper function
top_r_calibration_error = ToprCE(prob, labels)

#general function
top_r_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = False,
                              top_k_classes = r, norm = 1, thresholding = 0.00) 

Upcoming Features

Metrics

Visualizations

Other

  • PyTorch recalibration library

Releases

No releases published

Packages

No packages published

Languages