Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data (2404.14451v1)

Published 20 Apr 2024 in cs.LG and cs.AI

Abstract: Outlier detection in high-dimensional tabular data is an important task in data mining, essential for many downstream tasks and applications. Existing unsupervised outlier detection algorithms face one or more problems, including inlier assumption (IA), curse of dimensionality (CD), and multiple views (MV). To address these issues, we introduce Generative Subspace Adversarial Active Learning (GSAAL), a novel approach that uses a Generative Adversarial Network with multiple adversaries. These adversaries learn the marginal class probability functions over different data subspaces, while a single generator in the full space models the entire distribution of the inlier class. GSAAL is specifically designed to address the MV limitation while also handling the IA and CD, being the only method to do so. We provide a comprehensive mathematical formulation of MV, convergence guarantees for the discriminators, and scalability results for GSAAL. Our extensive experiments demonstrate the effectiveness and scalability of GSAAL, highlighting its superior performance compared to other popular OD methods, especially in MV scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Charu C. Aggarwal. Outlier Analysis. Springer International Publishing, Cham, 2017.
  2. Richard Bellman. Dynamic programming. Princeton, New Jersey: Princeton University Press. XXV, 342 p. (1957)., 1957.
  3. LOF: identifying density-based local outliers. In SIGMOD Conference, pages 93–104. ACM, 2000.
  4. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4):891–927, Jul 2016.
  5. Mcl-gan: Generative adversarial networks with multiple specialized discriminators. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 29597–29609. Curran Associates, Inc., 2022.
  6. François Chollet et al. Keras. https://keras.io, 2015.
  7. W Conover and R Iman. Multiple-comparisons procedures. informal report. Technical report, Los Alamos National Laboratory (LANL), February 1979.
  8. W. J. (William Jay) Conover. Practical nonparametric statistics / W.J. Conover. Wiley series in probability and statistics. Applied probability and statistics section. Wiley, New York ;, third edition. edition, 1999.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics, 2019.
  10. Adversarial feature learning. In International Conference on Learning Representations, 2017.
  11. Generative multi-adversarial networks. ArXiv, abs/1611.01673, 2016.
  12. One class random forests. Pattern Recognition, 46(12):3490–3506, 2013.
  13. Optimal single-class classification strategies. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006.
  14. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
  15. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
  16. Dual generative adversarial active learning. Applied Intelligence, 51(8):5953–5964, Aug 2021.
  17. Lookout on time-evolving graphs: Succinctly explaining anomalies from any detector, 2017.
  18. Adbench: Anomaly detection benchmark. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 32142–32159. Curran Associates, Inc., 2022.
  19. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
  20. One-class classification by combining density and class probability estimation. In Walter Daelemans, Bart Goethals, and Katharina Morik, editors, Machine Learning and Knowledge Discovery in Databases, pages 505–519, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
  21. Hics: High contrast subspaces for density-based outlier ranking. In 2012 IEEE 28th International Conference on Data Engineering, pages 1037–1048, 2012.
  22. Flexible and adaptive subspace search for outlier analysis. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM ’13, page 1381–1390, New York, NY, USA, 2013. Association for Computing Machinery.
  23. Angle-based outlier detection in high-dimensional data. In KDD, pages 444–452. ACM, 2008.
  24. Outlier detection in axis-parallel subspaces of high dimensional data. In Thanaruk Theeramunkong, Boonserm Kijsirikul, Nick Cercone, and Tu-Bao Ho, editors, Advances in Knowledge Discovery and Data Mining, pages 831–838, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
  25. William H. Kruskal. A nonparametric test for the several sample problem. The Annals of Mathematical Statistics, 23(4):525–540, 1952.
  26. Mmd gan: Towards deeper understanding of moment matching network. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  27. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413–422, 2008.
  28. Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering, 32(8):1517–1528, 2020.
  29. Outlier ranking via subspace analysis in multiple views of the data. In 2012 IEEE 12th International Conference on Data Mining, pages 529–538, 2012.
  30. Theoretical issues in deep networks. Proceedings of the National Academy of Sciences, 117(48):30039–30045, 2020.
  31. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, page 427–438, New York, NY, USA, 2000. Association for Computing Machinery.
  32. Deep one-class classification. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4393–4402. PMLR, 10–15 Jul 2018.
  33. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Marc Niethammer, Martin Styner, Stephen Aylward, Hongtu Zhu, Ipek Oguz, Pew-Thian Yap, and Dinggang Shen, editors, Information Processing in Medical Imaging, pages 146–157, Cham, 2017. Springer International Publishing.
  34. A literature review on one-class classification and its potential applications in big data. Journal of Big Data, 8(1):122, Sep 2021.
  35. Burr Settles. Active learning literature survey. 2009.
  36. Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5972–5981, 2019.
  37. Hiding outliers in high-dimensional data spaces. International Journal of Data Science and Analytics, 4(3):173–189, Nov 2017.
  38. Progress in outlier detection techniques: A survey. IEEE Access, 7:107964–108000, 2019.
  39. Pyod: A python toolbox for scalable outlier detection. Journal of Machine Learning Research, 20(96):1–7, 2019.
  40. Generative adversarial active learning. arXiv preprint arXiv:1702.07956, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jose Cribeiro-Ramallo (4 papers)
  2. Vadim Arzamasov (10 papers)
  3. Federico Matteucci (4 papers)
  4. Denis Wambold (1 paper)
  5. Klemens Böhm (21 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets