This repo has the development code for the distributed IMM algorithm. The final implementation can be found in d_imm_scala.
Scalable Iterative Mistake Minimization (IMM) for Clustering Explanations
Distributed IMM is a scalable PySpark implementation of the IMM algorithm for clustering explanations. It includes Cython-optimized histogram-based splitting and K-Means initialization for efficiency.
- Distributed IMM computation for large datasets
- Optimized histogram-based splitting
- Optimized mistake calculation with histograms
- K-Means initialization for clustering
MIT License