mpkmeans

A mixed-precision algorithm of $k$-means is designed towards understanding of the low precision arithmetic for Euclidean distance computations and analyzing the issues using low precision arithmetic for unnormalized data.

By performing simulations across data with various settings, we showcase that decreased precision for $k$-means computing only results in a minor increase in sum of squared errors while not necessarily leading to degrading performance regarding clustering results. The robustness of the mixed-precision $k$-means algorithms over various precisions is demonstrated. Fully reproducible experimental code is included in this repository, which illustrates the potential application of using mixed-precision k-means over various data science tasks including data clustering and image segmentation.

The dependencies for running our code and data loading:

classixclustering. (For preprocessed UCI data loading)
NumPy (The fundamental package for scientific computing)
Pandas (For data format and storage)
scikit-learn (Machine Learning in Python)
opencv-python (For image segmentation)
pychop (For low precision arithmetic simulation)

Details on the underlying algorithms can be found in the technical report:

One can install them before running our code via:

pip install classixclustering torch tqdm scikit-learn opencv-python

We also requires the installation of pychop of version 0.3.0, to install, use:

 pip install pychop==0.3.0

The repository contains the folder:

data: data used for the simulations
results: experimental results (figures and tables)
src: simulation code of mixed-precision k-means and distance computing

This repository contains the following algorithms for k-means computing:

StandardKMeans1 - the native kmeans algorithm using distance (4.3)
StandardKMeans2 - the native kmeans algorithm using distance (4.4)
mpKMeans - the mixed-precision kmeans algorithm using Algorithm 6.3
allowKMeans1 - kmeans performed in full low precision for computing distance (4.3)
allowKMeans2 - kmeans performed in full low precision for computing using distance (4.4)

One can load the library via

from src.kmeans import <classname> # e.g., from src.kmeans import mpKMeans

The following example showcases the useage of mpkmeans class

from pychop import chop
from src.kmeans import mpKMeans
from sklearn.datasets import make_blobs
from sklearn.metrics.cluster import adjusted_rand_score # For clustering quality evaluation

X, y = make_blobs(n_samples=2000, n_features=2, centers=5) # Generate data with 5 clusters

LOW_PREC = chop(prec='q52') # Define quarter precision
mpkmeans = mpKMeans(n_clusters=5, seeding='d2', low_prec=LOW_PREC, random_state=0, verbose=1)
mpkmeans.fit(x)

print(adjusted_rand_score(y, mpkmeans.labels)) # load clustering membership via mpkmeans.labels

Note that for half and single preicison simulation, user can directly use the built-class in our software via:

from src.kmeans import chop
import numpy as np

LOW_PREC = chop(np.float16)

All empirical results in paper can be produced via the bash command in Linux:

python3 run_all.py

After code running is completed, one can find the results in the folder results.

References

E. Carson, X. Chen, and X. Liu. Computing k-means in mixed precision. ArXiv:2407.12208 [math.NA], July 2024.

The bibtex:

@techreport{ccl24,
  author = "Erin Carson and Xinye Chen and Xiaobo Liu",
  title = "Computing $k$-means in Mixed Precision",
  month = jul,
  year = 2024,
  type = "{ArXiv}:2407.12208 [math.{NA}]",
  url = "https://arxiv.org/abs/2407.12208"
}

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
__pycache__		__pycache__
data		data
notebook		notebook
results		results
src		src
trash		trash
LICENSE		LICENSE
README.md		README.md
S-sets.ipynb		S-sets.ipynb
UCI.ipynb		UCI.ipynb
dist_compute.ipynb		dist_compute.ipynb
exp1.py		exp1.py
exp2.py		exp2.py
exp3.py		exp3.py
exp4.py		exp4.py
exp5.py		exp5.py
img_segmentation1.ipynb		img_segmentation1.ipynb
img_segmentation2.ipynb		img_segmentation2.ipynb
mp2test.ipynb		mp2test.ipynb
mp2testplot.ipynb		mp2testplot.ipynb
params.py		params.py
run_all.py		run_all.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mpkmeans

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

open-sciml/mpkmeans

Folders and files

Latest commit

History

Repository files navigation

mpkmeans

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages