Emergent Mind

Normalised clustering accuracy: An asymmetric external cluster validity measure

(2209.02935)
Published Sep 7, 2022 in cs.LG and stat.ML

Abstract

There is no, nor will there ever be, single best clustering algorithm, but we would still like to be able to distinguish between methods which work well on certain task types and those that systematically underperform. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. Yet, their validity is questionable, because the clusterings they promote can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to the fixed ground truth groupings that are provided by experts. In this paper, we argue that the commonly-used classical partition similarity scores, such as the normalised mutual information, Fowlkes--Mallows, or adjusted Rand index, miss some desirable properties. In particular, they do not identify worst-case scenarios correctly nor are they easily interpretable. As a consequence, it can be difficult to evaluate clustering algorithms on diverse benchmark datasets. To remedy these issues, we propose and analyse a new measure: a version of the optimal set-matching accuracy, which is normalised, monotonic with respect to some similarity relation, scale invariant, and corrected for the imbalancedness of cluster sizes (but neither symmetric nor adjusted for chance).

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.