Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retrieval Augmented Deep Anomaly Detection for Tabular Data (2401.17052v2)

Published 30 Jan 2024 in cs.LG

Abstract: Deep learning for tabular data has garnered increasing attention in recent years, yet employing deep models for structured data remains challenging. While these models excel with unstructured data, their efficacy with structured data has been limited. Recent research has introduced retrieval-augmented models to address this gap, demonstrating promising results in supervised tasks such as classification and regression. In this work, we investigate using retrieval-augmented models for anomaly detection on tabular data. We propose a reconstruction-based approach in which a transformer model learns to reconstruct masked features of \textit{normal} samples. We test the effectiveness of KNN-based and attention-based modules to select relevant samples to help in the reconstruction process of the target sample. Our experiments on a benchmark of 31 tabular datasets reveal that augmenting this reconstruction-based anomaly detection (AD) method with sample-sample dependencies via retrieval modules significantly boosts performance. The present work supports the idea that retrieval module are useful to augment any deep AD method to enhance anomaly detection on tabular data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Sercan Ö. Arik and Tomas Pfister. 2021. TabNet: Attentive Interpretable Tabular Learning. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 6679–6687. https://doi.org/10.1609/aaai.v35i8.16826
  2. CNN-BiLSTM: A Hybrid Deep Learning Approach for Network Intrusion Detection System in Software-Defined Networking With Hybrid Feature Selection. IEEE Access 11 (2023), 138732–138747. https://doi.org/10.1109/ACCESS.2023.3340142
  3. Liron Bergman and Yedid Hoshen. 2020. Classification-Based Anomaly Detection for General Data. In International Conference on Learning Representations. https://openreview.net/forum?id=H1lK_lBtvS
  4. Retrieval-Augmented Diffusion Models. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 15309–15324. https://proceedings.neurips.cc/paper_files/paper/2022/file/62868cc2fc1eb5cdf321d05b4b88510c-Paper-Conference.pdf
  5. LOF: Identifying Density-Based Local Outliers. SIGMOD Rec. 29, 2 (may 2000), 93–104. https://doi.org/10.1145/335191.335388
  6. Xiaoran Chen and Ender Konukoglu. 2018. Unsupervised Detection of Lesions in Brain MRI using constrained adversarial auto-encoders. In Medical Imaging with Deep Learning. https://openreview.net/forum?id=H1nGLZ2oG
  7. Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 23908–23922. https://proceedings.neurips.cc/paper_files/paper/2022/file/97011c648eda678424f9292dadeae72e-Paper-Conference.pdf
  8. PIDForest: Anomaly Detection and Certification via Partial Identification. In Neural Information Processing Systems. https://api.semanticscholar.org/CorpusID:202766416
  9. TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023. arXiv:2307.14338 [cs.LG]
  10. DROCC: Deep Robust One-Class Classification. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 3711–3721. https://proceedings.mlr.press/v119/goyal20c.html
  11. Why do tree-based models still outperform deep learning on typical tabular data?. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=Fp7__phQszn
  12. Robust Random Cut Forest Based Anomaly Detection on Streams. In International Conference on Machine Learning.
  13. ADBench: Anomaly Detection Benchmark. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=foA_SFQ9zo0
  14. Extended Isolation Forest. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2021), 1479–1489. https://doi.org/10.1109/TKDE.2019.2947676
  15. Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Systems with Applications 193 (2022), 116429. https://doi.org/10.1016/j.eswa.2021.116429
  16. RaPP: Novelty Detection with Reconstruction along Projection Pathway. In International Conference on Learning Representations.
  17. Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning. In Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (Eds.). https://openreview.net/forum?id=wRXzOa2z5T
  18. COPOD: Copula-Based Outlier Detection. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE. https://doi.org/10.1109/icdm50108.2020.00135
  19. Unsupervised Cross-Task Generalization via Retrieval Augmentation. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 22003–22017. https://proceedings.neurips.cc/paper_files/paper/2022/file/8a0d3ae989a382ce6e50312bc35bf7e1-Paper-Conference.pdf
  20. Unsupervised Anomaly Detection by Robust Density Estimation. Proceedings of the AAAI Conference on Artificial Intelligence 36, 4 (Jun. 2022), 4101–4108. https://doi.org/10.1609/aaai.v36i4.20328
  21. Isolation Forest. In 2008 Eighth IEEE International Conference on Data Mining. 413–422. https://doi.org/10.1109/ICDM.2008.17
  22. Neural Transformation Learning for Deep Anomaly Detection Beyond Images. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8703–8714. http://proceedings.mlr.press/v139/qiu21a.html
  23. Efficient Algorithms for Mining Outliers from Large Data Sets. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Texas, USA, Weidong Chen, Jeffrey F. Naughton, and Philip A. Bernstein (Eds.). ACM, 427–438. https://doi.org/10.1145/342009.335437
  24. Tal Reiss and Yedid Hoshen. 2023. Mean-Shifted Contrastive Loss for Anomaly Detection. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’23/IAAI’23/EAAI’23). AAAI Press, Article 240, 8 pages. https://doi.org/10.1609/aaai.v37i2.25309
  25. A Unifying Review of Deep and Shallow Anomaly Detection. Proc. IEEE 109, 5 (May 2021), 756–795. https://doi.org/10/gjmk3g arXiv:2009.11732
  26. Deep One-Class Classification. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80). PMLR, 4393–4402. http://proceedings.mlr.press/v80/ruff18a.html
  27. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. In Information Processing in Medical Imaging, Marc Niethammer, Martin Styner, Stephen Aylward, Hongtu Zhu, Ipek Oguz, Pew-Thian Yap, and Dinggang Shen (Eds.). Springer International Publishing, Cham, 146–157.
  28. Support Vector Method for Novelty Detection. In Proceedings of the 12th International Conference on Neural Information Processing Systems (Denver, CO) (NIPS’99). MIT Press, Cambridge, MA, USA, 582–588.
  29. Ira Shavitt and Eran Segal. 2018. Regularization Learning Networks: Deep Learning for Tabular Datasets. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/500e75a036dc2d7d2fec5da1b71d36cc-Paper.pdf
  30. Tom Shenkar and Lior Wolf. 2022. Anomaly Detection for Tabular Data with Internal Contrastive Learning. In International Conference on Learning Representations.
  31. Attackers are not Stealthy: Statistical Analysis of the Well-Known and Infamous KDD Network Security Dataset. In 2020 4th Conference on Cloud and Internet of Things (CIoT). 1–8. https://doi.org/10.1109/CIoT50422.2020.9244289
  32. Learning and Evaluating Representations for Deep One-Class Classification. In International Conference on Learning Representations. https://openreview.net/forum?id=HCSgyPUfeDj
  33. SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training. CoRR abs/2106.01342 (2021). arXiv:2106.01342 https://arxiv.org/abs/2106.01342
  34. David Tax and Robert Duin. 2004. Support Vector Data Description. Machine Learning 54 (01 2004), 45–66. https://doi.org/10.1023/B:MACH.0000008084.60811.49
  35. Beyond Individual Input for Deep Anomaly Detection on Tabular Data. In NeurIPS 2023 Second Table Representation Learning Workshop. https://openreview.net/forum?id=lsn7ehxAdt
  36. Comparative Evaluation of Anomaly Detection Methods for Fraud Detection in Online Credit Card Payments. arXiv:2312.13896 [cs.LG]
  37. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  38. Anomaly detection for medical images based on a one-class classification. In Medical Imaging 2018: Computer-Aided Diagnosis, Nicholas Petrick and Kensaku Mori (Eds.), Vol. 10575. International Society for Optics and Photonics, SPIE, 105751M. https://doi.org/10.1117/12.2293408
  39. Diffusion Models for Medical Anomaly Detection. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, Linwei Wang, Qi Dou, P. Thomas Fletcher, Stefanie Speidel, and Shuo Li (Eds.). Springer Nature Switzerland, Cham, 35–45.
  40. Classification of imbalanced data: a review. International Journal of Pattern Recognition and Artificial Intelligence 23 (11 2011), 687–719. https://doi.org/10.1142/S0218001409007326
  41. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes. In International Conference on Learning Representations. https://openreview.net/forum?id=Syx4wnEtvH
  42. Deep Structured Energy Based Models for Anomaly Detection. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 1100–1109. https://proceedings.mlr.press/v48/zhai16.html
  43. DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection. arXiv:2303.08730 [cs.CV]
  44. Lookahead Optimizer: k steps forward, 1 step back. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/90fd4f88f588ae64038134f1eeaa023f-Paper.pdf
  45. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. In International Conference on Learning Representations.

Summary

We haven't generated a summary for this paper yet.