Hexaly vs Gurobi on the K-Means Clustering Problem (MSSC)

The K-Means Clustering Problem is defined as follows: given a set of points along some dimensions, find a partition of these points into k clusters. The total sum of squared Euclidean distances between each cluster’s centroid and its elements is minimized. This problem is also known as the Minimum Sum-of-Squares Clustering (MSSC).

On the K-Means Clustering Problem (MSSC), Hexaly reaches a gap to the best solutions in the literature below 1% in 1 minute of running time for almost all instances considered in the benchmark. We illustrate on this page how Hexaly outperforms traditional general-purpose optimization solvers, like Gurobi, on this challenging problem.

Input data

The instances of this benchmark come from the UCI Machine Learning Repository, gathering a wide variety of datasets used in machine learning with an arbitrary number of dimensions, and the TSPLIB, which is the reference instance repository for the Traveling Salesman Problem (TSP) and contains only 2-dimensional points. From these two repositories, we extracted the instances used in research articles that describe state-of-the-art dedicated algorithms for the K-Means Clustering Problem [1][2][3][4][5].

Mathematical models for the K-Means Clustering Problem (MSSC)

Mixed-Integer Quadratically Constrained Programming (MIQCP) model

The results reported for Gurobi are obtained with the Mixed-Integer Quadratically Constrained Programming (MIQCP) formulation presented here. Continuous variables represent each cluster center, one for each dimension, allowing the computation of Euclidean distances to each point thanks to quadratic constraints. In addition, binary variables are used to model assignments of points to clusters. Note that we had to scale all coordinates in the range [0,1] for Gurobi due to numerical issues on instances with large coordinates.

Hexaly model

The Hexaly model relies on set variables representing assignments of points to clusters. The centroid of each cluster is then computed using a lambda expression. The distance between the points assigned to a cluster and its centroid is computed following the same approach. Compared to the MIQCP model, the Hexaly model is much more compact, as the only decisions to be taken correspond to assigning points to clusters. Set variables combined with lambda expressions enable simple and intuitive problem modeling without limiting the possible extensions.

function model() {
  // Set variables: clusters[c] represents the points in cluster c
  clusters[1..k] <- set(nbPoints);

  // Each point must belong to one and only one cluster
  constraint partition[c in 1..k](clusters[c]);

  for [c in 1..k] {
      local cluster <- clusters[c];
      local size <- count(cluster);

      // Compute the centroid of the cluster
      centroid[d in 0..nbDimensions-1] <- size == 0 ? 0 : sum(cluster, o => coordinates[o][d]) / size;

      // Compute the variance of the cluster
      squares[d in 0..nbDimensions-1] <- sum(cluster, o => pow(coordinates[o][d] - centroid[d], 2));
      variances[c] <- sum[d in 0..nbDimensions-1](squares[d]);
  }

  // Minimize the total variance
  obj <- sum[c in 1..k](variances[c]);
  minimize obj;
}

Hexaly and Gurobi results on the K-Means Clustering Problem (MSSC)

We compare both solvers’ performance after 1 minute of running time. At the end of the running time, we measure the gap in % to the best solutions known in the literature, also called State Of The Art (SOTA).  Each instance is solved with different values of K, the number of clusters; in the column “clusters”, we report the smallest and largest values of K considered. For each instance, the gap reported in the table corresponds to the arithmetic mean of the gaps in % between the best known solution and the solution found by the solver for each value of K. If no feasible solution is found within the time limit or the gap found is greater than 100%, then the gap is considered to be 100%.

We use Hexaly 13.0 and Gurobi 11.0, a state-of-the-art Mixed-Integer Quadratically Constrained Programming (MIQCP) solver. Both are used with default parameters. We ran them on a server equipped with an AMD Ryzen 7 7700 processor (8 cores, 3.8GHz, 8MB cache) and 32GB RAM.

Instances Points Dimensions Clusters Hexaly Gurobi
german 59 2 2-10 0.0 1.9
ruspini 75 2 2-10 0.0 6.3
spath 89 3 2-10 0.0 5.0
hatco 100 7 2-10 0.0 5.1
iris 150 4 2-10 0.0 9.1
gr202 202 2 2-10 0.0 12.8
glass 214 9 15-50 1.7 100.0
body 507 5 30-80 1.9 100.0
gr666 666 2 2-10 0.0 28.5
breast 683 9 2-25 0.4 61.2
vowel 871 3 40-100 3.8 100.0
concrete 1,030 8 60-120 3.3 100.0
u1060 1,060 2 2-25 0.1 68.6
segment 2,310 19 230-500 10.8 100.0
pr2392 2,392 2 2-10 0.0 74.7
pcb3038 3,038 2 2-25 0.1 83.4
pendigit 10,992 16 2-25 0.6 87.1
d15112 15,112 2 2-25 0.2 89.3
letter 20,000 16 2-25 0.2 87.3
kegg 53,413 20 2-25 32.9 84.8
shuttle 58,000 9 2-25 2.6 92.3
pla85900 85,900 2 2-25 0.3 97.7
Average gaps in % to the SOTA solutions.

Hexaly obtains solutions with a gap to the SOTA lower than 1% in 1 minute of running time for almost all instances. On the other hand, Gurobi finds decent solutions for instances with less than 200 points. Beyond this, Gurobi struggles to find feasible solutions, especially when the number of clusters exceeds 5.

Conclusion

Hexaly offers an innovative modeling approach based on set variables, making the modeling of the K-Means Clustering Problem (MSSC) much more compact than the Mixed-Integer Quadratically Constrained Programming (MIQCP) formulation handled by traditional general-purpose optimization solvers like Gurobi. Extra constraints can be added easily, offering flexibility to extend the model to suit real-world clustering needs. Beyond simplicity, Hexaly delivers state-of-the-art solutions in minutes.

You can find the complete model of the K-Means Clustering Problem (MSSC) and many other Clustering problems in Python, Java, C#, and C++ in our Example Tour.

Discover the ease of use and performance of Hexaly through a free 1-month trial.


[1] D. Aloise, P. Hansen (2009). A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering. Pesquisa Operacional. 29(3), 503-516.

[2] D. Aloise, P. Hansen, L. Liberti (2012). An improved column generation algorithm for minimum sum-of-squares clustering. Mathematical Programming 131, pp. 195–220.

[3] A.M. Bagirov, B. Ordin, G. Ozturk, A.E. Xavier (2015). An incremental clustering algorithm based on hyperbolic smoothing. Computational Optimization and Applications 61, pp. 219–241.

[4] T. Pereira, D. Aloise, J. Brimberg, N. Mladenović (2018). Review of basic local searches for solving the minimum sum-of-squares clustering problem. In P. Pardalos, A. Migdalas (eds), Open Problems in Optimization and Data Analysis. Springer Optimization and Its Applications, vol. 141. Springer, Cham.

[5] P. Kalczynski, J. Brimberg, Z. Drezner (2019). The importance of good starting solutions in the minimum sum of squares clustering problem. Submitted on arXiv on 6 Apr 2020.