Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AugDMC: Data Augmentation Guided Deep Multiple Clustering (2306.13023v1)

Published 22 Jun 2023 in cs.CV

Abstract: Clustering aims to group similar objects together while separating dissimilar ones apart. Thereafter, structures hidden in data can be identified to help understand data in an unsupervised manner. Traditional clustering methods such as k-means provide only a single clustering for one data set. Deep clustering methods such as auto-encoder based clustering methods have shown a better performance, but still provide a single clustering. However, a given dataset might have multiple clustering structures and each represents a unique perspective of the data. Therefore, some multiple clustering methods have been developed to discover multiple independent structures hidden in data. Although deep multiple clustering methods provide better performance, how to efficiently capture the alternative perspectives in data is still a problem. In this paper, we propose AugDMC, a novel data Augmentation guided Deep Multiple Clustering method, to tackle the challenge. Specifically, AugDMC leverages data augmentations to automatically extract features related to a certain aspect of the data using a self-supervised prototype-based representation learning, where different aspects of the data can be preserved under different data augmentations. Moreover, a stable optimization strategy is proposed to alleviate the unstable problem from different augmentations. Thereafter, multiple clusterings based on different aspects of the data can be obtained. Experimental results on three real-world datasets compared with state-of-the-art methods validate the effectiveness of the proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14.   Oakland, CA, USA, 1967, pp. 281–297.
  2. A. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” Advances in neural information processing systems, vol. 14, 2001.
  3. J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” in International conference on machine learning, 2016, pp. 478–487.
  4. X. Peng, J. Feng, J. Lu, W.-Y. Yau, and Z. Yi, “Cascade subspace clustering,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
  5. K. Ghasedi Dizaji, A. Herandi, C. Deng, W. Cai, and H. Huang, “Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5736–5745.
  6. J. Guérin and B. Boots, “Improving image clustering with multiple pretrained cnn feature extractors,” in British Machine Vision Conference 2018, BMVC 2018, 2018.
  7. P. H. Guzzi, E. Masciari, G. M. Mazzeo, and C. Zaniolo, “A discussion on the biological relevance of clustering results,” in Information Technology in Bio-and Medical Informatics: 5th International Conference.   Springer, 2014, pp. 30–44.
  8. M. Abd Elaziz, M. AA Al-Qaness, E. O. Abo Zaid, S. Lu, R. Ali Ibrahim, and A. A. Ewees, “Automatic clustering method to segment covid-19 ct images,” PLoS One, vol. 16, no. 1, p. e0244416, 2021.
  9. E. L. Lydia, P. Govindaswamy, S. Lakshmanaprabu, and D. Ramya, “Document clustering based on text mining k-means algorithm using euclidean distance similarity,” Journal of Advanced Research in Dynamical & Control Systems, vol. 10, no. 02-Special Issue, 2018.
  10. L. Abualigah, A. H. Gandomi, M. A. Elaziz, A. G. Hussien, A. M. Khasawneh, M. Alshinwan, and E. H. Houssein, “Nature-inspired optimization algorithms for text document clustering—a comprehensive analysis,” Algorithms, vol. 13, no. 12, p. 345, 2020.
  11. J. Hu, Q. Qian, J. Pei, R. Jin, and S. Zhu, “Finding multiple stable clusterings,” Knowledge and Information Systems, vol. 51, no. 3, pp. 991–1021, 2017.
  12. S. Yao, G. Yu, J. Wang, C. Domeniconi, and X. Zhang, “Multi-view multiple clustering,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019, pp. 4121–4127.
  13. J. Wang, H. Zhang, W. Ren, M. Guo, and G. Yu, “Epimc: Detecting epistatic interactions using multiple clusterings,” IEEE Transactions on Computational Biology and Bioinformatics, vol. 19, no. 1, pp. 243–254, 2021.
  14. L. Miklautz, D. Mautz, M. C. Altinigneli, C. Böhm, and C. Plant, “Deep embedded non-redundant clustering,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 5174–5181.
  15. D. A. Van Dyk and X.-L. Meng, “The art of data augmentation,” Journal of Computational and Graphical Statistics, vol. 10, no. 1, pp. 1–50, 2001.
  16. J. Hu and J. Pei, “Subspace multi-clustering: a review,” Knowledge and information systems, vol. 56, no. 2, pp. 257–284, 2018.
  17. E. Bae and J. Bailey, “Coala: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity,” in ICDM.   IEEE, 2006, pp. 53–62.
  18. Z. Qi and I. Davidson, “A principled and flexible framework for finding alternative clusterings,” in SIGKDD, 2009, pp. 717–726.
  19. D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999.
  20. S. Yang and L. Zhang, “Non-redundant multiple clustering by nonnegative matrix factorization,” Machine Learning, vol. 106, no. 5, pp. 695–712, 2017.
  21. D. Gondek and T. Hofmann, “Conditional information bottleneck clustering,” in 3rd ieee international conference on data mining, workshop on clustering large data sets, 2003, pp. 36–42.
  22. X. H. Dang and J. Bailey, “Generation of alternative clusterings using the cami approach,” in Proceedings of the 2010 SIAM International Conference on Data Mining.   SIAM, 2010, pp. 118–129.
  23. S. Wei, J. Wang, G. Yu, C. Domeniconi, and X. Zhang, “Multi-view multiple clusterings using deep matrix factorization,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 6348–6355.
  24. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  25. L. Ren, G. Yu, J. Wang, L. Liu, C. Domeniconi, and X. Zhang, “A diversified attention model for interpretable multiple clusterings,” IEEE Transactions on Knowledge and Data Engineering, 2022.
  26. A. Fawzi, H. Samulowitz, D. Turaga, and P. Frossard, “Adaptive data augmentation for image classification,” in 2016 IEEE international conference on image processing (ICIP).   Ieee, 2016, pp. 3688–3692.
  27. A. Mikołajczyk and M. Grochowski, “Data augmentation for improving deep learning in image classification problem,” in 2018 international interdisciplinary PhD workshop (IIPhDW).   IEEE, 2018, pp. 117–122.
  28. J. Nalepa, M. Marcinkiewicz, and M. Kawulok, “Data augmentation for brain-tumor segmentation: a review,” Frontiers in computational neuroscience, vol. 13, p. 83, 2019.
  29. A. Zhao, G. Balakrishnan, F. Durand, J. V. Guttag, and A. V. Dalca, “Data augmentation using learned transformations for one-shot medical image segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8543–8553.
  30. H. Jo, Y.-H. Na, and J.-B. Song, “Data augmentation using synthesized images for object detection,” in 2017 17th International Conference on Control, Automation and Systems (ICCAS).   IEEE, 2017, pp. 1035–1038.
  31. B. Zoph, E. D. Cubuk, G. Ghiasi, T.-Y. Lin, J. Shlens, and Q. V. Le, “Learning data augmentation strategies for object detection,” in European conference on computer vision.   Springer, 2020, pp. 566–583.
  32. A. Vyas, S. Yu, and J. Paik, “Fundamentals of digital image processing,” in Multiscale Transforms with Application to Image Processing.   Springer, 2018, pp. 3–11.
  33. L. Sifre and S. Mallat, “Rotation, scaling and deformation invariant scattering for texture discrimination,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 1233–1240.
  34. A. Galdran, A. Alvarez-Gila, M. I. Meyer, C. L. Saratxaga, T. Araújo, E. Garrote, G. Aresta, P. Costa, A. M. Mendonça, and A. Campilho, “Data-driven color augmentation techniques for deep skin image analysis,” arXiv preprint arXiv:1703.03702, 2017.
  35. X. Yi, E. Walia, and P. Babyn, “Generative adversarial network in medical imaging: A review,” Medical image analysis, vol. 58, p. 101552, 2019.
  36. L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” arXiv preprint arXiv:1508.06576, 2015.
  37. B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.
  38. C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of big data, vol. 6, no. 1, pp. 1–48, 2019.
  39. N. E. Khalifa, M. Loey, and S. Mirjalili, “A comprehensive survey of recent trends in deep learning for digital images augmentation,” Artificial Intelligence Review, vol. 55, no. 3, pp. 2351–2377, 2022.
  40. X. Guo, E. Zhu, X. Liu, and J. Yin, “Deep embedded clustering with data augmentation,” in Asian conference on machine learning.   PMLR, 2018, pp. 550–565.
  41. X. Guo, X. Liu, E. Zhu, X. Zhu, M. Li, X. Xu, and J. Yin, “Adaptive self-paced deep clustering with data augmentation,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 9, pp. 1680–1693, 2019.
  42. M. Abavisani, A. Naghizadeh, D. Metaxas, and V. Patel, “Deep subspace clustering with data augmentation,” Advances in Neural Information Processing Systems, vol. 33, pp. 10 360–10 370, 2020.
  43. G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
  44. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  45. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  46. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning.   PMLR, 2019, pp. 6105–6114.
  47. E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, “Randaugment: Practical automated data augmentation with a reduced search space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 702–703.
  48. S. Lloyd, “Least squares quantization in pcm,” IEEE transactions on information theory, vol. 28, no. 2, pp. 129–137, 1982.
  49. J. White, S. Steingold, and C. Fournelle, “Performance metrics for group-detection algorithms,” Proceedings of Interface, vol. 2004, 2004.
  50. W. M. Rand, “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical association, vol. 66, no. 336, pp. 846–850, 1971.
  51. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
Citations (7)

Summary

We haven't generated a summary for this paper yet.