Sparse-Dense Subspace Clustering

Published 20 Oct 2019 in cs.LG and stat.ML | (1910.08909v1)

Abstract: Subspace clustering refers to the problem of clustering high-dimensional data into a union of low-dimensional subspaces. Current subspace clustering approaches are usually based on a two-stage framework. In the first stage, an affinity matrix is generated from data. In the second one, spectral clustering is applied on the affinity matrix. However, the affinity matrix produced by two-stage methods cannot fully reveal the similarity between data points from the same subspace (intra-subspace similarity), resulting in inaccurate clustering. Besides, most approaches fail to solve large-scale clustering problems due to poor efficiency. In this paper, we first propose a new scalable sparse method called Iterative Maximum Correlation (IMC) to learn the affinity matrix from data. Then we develop Piecewise Correlation Estimation (PCE) to densify the intra-subspace similarity produced by IMC. Finally we extend our work into a Sparse-Dense Subspace Clustering (SDSC) framework with a dense stage to optimize the affinity matrix for two-stage methods. We show that IMC is efficient when clustering large-scale data, and PCE ensures better performance for IMC. We show the universality of our SDSC framework as well. Experiments on several data sets demonstrate the effectiveness of our approaches. Moreover, we are the first one to apply densification on affinity matrix before spectral clustering, and SDSC constitutes the first attempt to build a universal three-stage subspace clustering framework.