Running K-means
KMeans is a clustering algorithm. Its purpose is to partition a set of vectors into K
groups that cluster around common mean vector. This can also be thought as approximating the input each of the input vector with one of the means, so the clustering process finds, in principle, the best dictionary or codebook to vector quantize the data.
Consider a dataset containing 1000 randomly sampled 2D points:
numData = 5000 ;dimension = 2 ;data = rand(dimension,numData) ;
Given a new data point x
, this can be mapped to one of the clusters by looking for the closest center:
x = rand(dimension, 1) ;[~, k] = min(vl_alldist(x, centers)) ;
, expressed as a fraction of the number of clusters. The figure reports the duration, final energy value, and speedup factor, using both the serial and parallel versions of the code. The figure was generated using .