Flash-GMM is an IBM Research project that makes a classic machine learning technique called Gaussian Mixture Models (GMMs) work at a scale that was previously impossible on a single GPU. A GMM is a way of grouping data points into clusters where each cluster is represented by a bell curve rather than a hard boundary. The "soft clustering" in the description means each data point gets a probability of belonging to each cluster rather than being forced into just one. The problem Flash-GMM solves is memory. In the standard way of computing GMMs, the algorithm has to store a large matrix that holds one number for every combination of data point and cluster. If you have a million data points and a thousand clusters, that matrix alone needs roughly 21 gigabytes of GPU memory, and it grows proportionally as the data gets larger. For a billion data points, it simply does not fit. Flash-GMM avoids building that matrix entirely. Instead, it borrows an approach from a well-known technique called FlashAttention (used to make large language models more memory efficient) and processes the data in small tiles. For each tile, it computes the cluster assignments on-chip and immediately adds the results into a small set of running totals. The large intermediate matrix is never written to GPU memory at all. The result is a memory footprint of about 4.5 megabytes for the same configuration that would normally require 21 gigabytes, a reduction of over four thousand times. In terms of speed, the project reports that Flash-GMM runs 766 to 1,740 times faster than a standard CPU implementation (SciPy) and 19 to 32 times faster than a GPU-based competitor (TorchGMM). TorchGMM also runs out of memory once the dataset exceeds about one million points, while Flash-GMM was tested at one billion points on a single GPU. The project is written in Python and uses a library called Triton to write the GPU kernel directly. Installation requires only PyTorch and Triton. It is published under the Apache 2.0 license and comes with a preprint citation for the associated research paper from IBM Research.
← ibm on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.