A mixed-precision algorithm of
By performing simulations across data with various settings, we showcase that decreased precision for
The dependencies for running our code and data loading:
- classixclustering. (For preprocessed UCI data loading)
- NumPy (The fundamental package for scientific computing)
- Pandas (For data format and storage)
- scikit-learn (Machine Learning in Python)
- opencv-python (For image segmentation)
- pychop (For low precision arithmetic simulation)
Details on the underlying algorithms can be found in the technical report:
One can install them before running our code via:
pip install classixclustering torch tqdm scikit-learn opencv-python
We also requires the installation of pychop
of version 0.3.0, to install, use:
pip install pychop==0.3.0
The repository contains the folder:
data
: data used for the simulationsresults
: experimental results (figures and tables)src
: simulation code of mixed-precision k-means and distance computing
This repository contains the following algorithms for k-means computing:
- StandardKMeans1 - the native kmeans algorithm using distance (4.3)
- StandardKMeans2 - the native kmeans algorithm using distance (4.4)
- mpKMeans - the mixed-precision kmeans algorithm using Algorithm 6.3
- allowKMeans1 - kmeans performed in full low precision for computing distance (4.3)
- allowKMeans2 - kmeans performed in full low precision for computing using distance (4.4)
One can load the library via
from src.kmeans import <classname> # e.g., from src.kmeans import mpKMeans
The following example showcases the useage of mpkmeans
class
from pychop import chop
from src.kmeans import mpKMeans
from sklearn.datasets import make_blobs
from sklearn.metrics.cluster import adjusted_rand_score # For clustering quality evaluation
X, y = make_blobs(n_samples=2000, n_features=2, centers=5) # Generate data with 5 clusters
LOW_PREC = chop(prec='q52') # Define quarter precision
mpkmeans = mpKMeans(n_clusters=5, seeding='d2', low_prec=LOW_PREC, random_state=0, verbose=1)
mpkmeans.fit(x)
print(adjusted_rand_score(y, mpkmeans.labels)) # load clustering membership via mpkmeans.labels
Note that for half and single preicison simulation, user can directly use the built-class in our software via:
from src.kmeans import chop
import numpy as np
LOW_PREC = chop(np.float16)
All empirical results in paper can be produced via the bash command in Linux:
python3 run_all.py
After code running is completed, one can find the results in the folder results
.
E. Carson, X. Chen, and X. Liu. Computing k-means in mixed precision. ArXiv:2407.12208 [math.NA], July 2024.
The bibtex:
@techreport{ccl24,
author = "Erin Carson and Xinye Chen and Xiaobo Liu",
title = "Computing $k$-means in Mixed Precision",
month = jul,
year = 2024,
type = "{ArXiv}:2407.12208 [math.{NA}]",
url = "https://arxiv.org/abs/2407.12208"
}