A machine learning model calibration metrics library, built using NumPy. Evaluate model uncertainty using popular calibration metrics from deep learning research. Inspired by Uncertainty Toolbox.
- Kull et al. (2019) - Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration
- Nixon et al. (2020) - Measuring Calibration in Deep Learning
The (class conditional) general calibration error with
Where
general_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = False,
top_k_classes = 1, norm = 1, thresholding = 0.00)
Additionally, we provide a simple interface to utilize commonly used metrics:
- Naeini et al. (2015) - Obtaining Well Calibrated Probabilities Using Bayesian Binning
#wrapper function
expected_calibration_error = ECE(prob, labels)
#general function
expected_calibration_error = GCE(prob, labels, bin = 15, class_conditional = False, adaptive_bins = False,
top_k_classes = 1, norm = 1, thresholding = 0.00)
- Hendrycks et al. (2019) - Deep Anomaly Detection with Outlier Exposure
#wrapper function
root_mean_square_calibration_error = RMSCE(prob, labels)
#general function
root_mean_square_calibration_error = GCE(prob, labels, bin = 15, class_conditional = False, adaptive_bins = False,
top_k_classes = 1, norm = 2, thresholding = 0.00)
- Naeini et al. (2015) - Obtaining Well Calibrated Probabilities Using Bayesian Binning
#wrapper function
maximum_calibration_error = MCE(prob, labels)
#general function
maximum_calibration_error = GCE(prob, labels, bin = 15, class_conditional = False, adaptive_bins = False,
top_k_classes = 1, norm = 'inf', thresholding = 0.00)
- Nixon et al. (2020) - Measuring Calibration in Deep Learning
#wrapper function
adaptive_calibration_error = ACE(prob, labels)
#general function
adaptive_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = True,
top_k_classes = 'all', norm = 1, thresholding = 0.00)
- Nixon et al. (2020) - Measuring Calibration in Deep Learning
#wrapper function
static_calibration_error = SCE(prob, labels)
#general function
static_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = False,
top_k_classes = 'all', norm = 1, thresholding = 0.00)
- Gupta et al. (2021) - Calibration of Neural Networks using Splines
#wrapper function
top_r_calibration_error = ToprCE(prob, labels)
#general function
top_r_calibration_error = GCE(prob, labels, bin = 15, class_conditional = True, adaptive_bins = False,
top_k_classes = r, norm = 1, thresholding = 0.00)
- Class-wise calibration error (CWCE), Kumar et al. (2019) - Verified Uncertainty Calibration
- Top-label calibration error (TCE), Kumar et al. (2019) - Verified Uncertainty Calibration
- Kolmogorov-Smirnov calibration error (KSCE), Gupta et al. (2021) - Calibration of Neural Networks using Splines
- Maximum mean calibration error (MMCE), Kumar et al. (2018) - Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings
- Kernel calibration error (KCE), Kull et al. (2020) - Calibration tests in multi-class classification: A unifying framework
- Over/Under confidence decomposition, Pearce et al. (2022) - Adaptive Confidence Calibration
- Reliability Diagram
- Confidence Histograms, Guo et al. (2017) - On Calibration of Modern Neural Networks
- PyTorch recalibration library