Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attention versus Contrastive Learning of Tabular Data -- A Data-centric Benchmarking (2401.04266v1)

Published 8 Jan 2024 in cs.LG

Abstract: Despite groundbreaking success in image and text learning, deep learning has not achieved significant improvements against traditional ML when it comes to tabular data. This performance gap underscores the need for data-centric treatment and benchmarking of learning algorithms. Recently, attention and contrastive learning breakthroughs have shifted computer vision and natural language processing paradigms. However, the effectiveness of these advanced deep models on tabular data is sparsely studied using a few data sets with very large sample sizes, reporting mixed findings after benchmarking against a limited number of baselines. We argue that the heterogeneity of tabular data sets and selective baselines in the literature can bias the benchmarking outcomes. This article extensively evaluates state-of-the-art attention and contrastive learning methods on a wide selection of 28 tabular data sets (14 easy and 14 hard-to-classify) against traditional deep and machine learning. Our data-centric benchmarking demonstrates when traditional ML is preferred over deep learning and vice versa because no best learning method exists for all tabular data sets. Combining between-sample and between-feature attentions conquers the invincible traditional ML on tabular data sets by a significant margin but fails on high dimensional data, where contrastive learning takes a robust lead. While a hybrid attention-contrastive learning strategy mostly wins on hard-to-classify data sets, traditional methods are frequently superior on easy-to-classify data sets with presumably simpler decision boundaries. To the best of our knowledge, this is the first benchmarking paper with statistical analyses of attention and contrastive learning performances on a diverse selection of tabular data sets against traditional deep and machine learning baselines to facilitate further advances in this field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Deep imputation of missing values in time series health data: A review with benchmarking. Journal of Biomedical Informatics, 144:104440, 2023.
  2. Why do tree-based models still outperform deep learning on typical tabular data? In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  3. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  4. Well-tuned simple nets excel on tabular datasets. Advances in neural information processing systems, 34:23928–23941, 2021.
  5. UCI machine learning repository, 2017.
  6. Effectiveness of deep image embedding clustering methods on tabular data. In 2023 15th International Conference on Advanced Computational Intelligence (ICACI), pages 1–7, 2023.
  7. Are deep learning models superior for missing data imputation in large surveys? Evidence from an empirical comparison. 2021.
  8. Ensemble learning or deep learning? application to default risk analysis. Journal of Risk and Financial Management, 11(1):12, 2018.
  9. Deep learning does not outperform classical machine learning for cell-type annotation. BioRxiv, page 653907, 2019.
  10. Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data. BMC bioinformatics, 21(1):1–18, 2020.
  11. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 5 2022.
  12. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. npj Digital Medicine 2020 3:1, 3(1):1–9, oct 2020.
  13. Multi-Layer Attention-Based Explainability via Transformers for Tabular Data. feb 2023.
  14. Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular Data. arXiv preprint arXiv:2302.14013, feb 2023.
  15. Steve Lohr. For big-data scientists,‘janitor work’is key hurdle to insights. New York Times, 17:B4, 2014.
  16. Survey on categorical data for neural networks. Journal of Big Data, 7(1):28, dec 2020.
  17. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 23:18932–18943, 2021.
  18. Self-attention between datapoints: Going beyond individual input-output pairs in deep learning. Advances in Neural Information Processing Systems, 34:28742–28756, 2021.
  19. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  21. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1161–1170, New York, NY, USA, nov 2019. ACM.
  22. TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv preprint arXiv:2012.06678, dec 2020.
  23. Gradient boosting neural networks: Grownet, 2022.
  24. Neural oblivious decision ensembles for deep learning on tabular data. In International Conference on Learning Representations, 2020.
  25. Perturbation of deep autoencoder weights for model compression and classification of tabular data. Neural Networks, 156:160–169, 2022.
  26. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  27. Scarf: Self-supervised contrastive learning using random feature corruption. In International Conference on Learning Representations, 2022.
  28. OpenML benchmarking suites. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
  29. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019.
  30. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
  31. Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain. arXiv preprint arXiv:2108.12296, aug 2021.
  32. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. In NeurIPS 2022 First Table Representation Workshop, 2022.
  33. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  34. STab: Self-supervised Learning for Tabular Data. In NeurIPS 2022 First Table Representation Workshop, 2022.
  35. Subtab: Subsetting features of tabular data for self-supervised representation learning. Advances in Neural Information Processing Systems, 34:18853–18865, 2021.
  36. ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data. jan 2023.
  37. TabPFN: A transformer that solves small tabular classification problems in a second. In The Eleventh International Conference on Learning Representations, 2023.
  38. TabCaps: A Capsule Neural Network for Tabular Data Classification with BoW Routing. In The Eleventh International Conference on Learning Representations, sep 2023.
  39. T2g-former: organizing tabular features into relation graphs promotes heterogeneous feature interaction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10720–10728, 2023.
  40. Tabnet: Attentive interpretable tabular learning. pages 6679–6687, 8 2021.
  41. Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR, sep 2014.
  42. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, October 2014. Association for Computational Linguistics.
  43. Effective approaches to attention-based neural machine translation. In Lluís Màrquez, Chris Callison-Burch, and Jian Su, editors, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, Lisbon, Portugal, September 2015. Association for Computational Linguistics.
  44. An Attentive Survey of Attention Models. ACM Transactions on Intelligent Systems and Technology, 12(5), oct 2021.
  45. A review on the attention mechanism of deep learning. Neurocomputing, 452:48–62, sep 2021.
  46. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020.
  47. Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics, 2019.
  48. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  49. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020.
  50. What makes for good views for contrastive learning? Advances in neural information processing systems, 33:6827–6839, 2020.
  51. Big self-supervised models are strong semi-supervised learners. Advances in neural information processing systems, 33:22243–22255, 2020.
  52. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
  53. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  54. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929–9939. PMLR, 2020.
  55. VIME: Extending the Success of Self-and Semi-supervised Learning to Tabular Domain. In Advances in Neural Information Processing Systems 33, 2020.
  56. Revisiting pretraining objectives for tabular deep learning. arXiv preprint arXiv:2207.03208, 2022.
  57. Lovedeep Gondara and Ke Wang. MIDA: Multiple imputation using denoising autoencoders. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 10939 LNAI, pages 260–272. Springer Verlag, 2018.
  58. TABBIE: Pretrained Representations of Tabular Data. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3446–3456, Stroudsburg, PA, USA, 2021. Association for Computational Linguistics.
  59. Stephen W. Scheff. Fundamental Statistical Principles for the Neurobiologist: A Survival Guide. Elsevier, jan 2016.
  60. Predicting Survival From Large Echocardiography and Electronic Health Record Datasets: Optimization With Machine Learning. JACC: Cardiovascular Imaging, 12(4):681–689, jun 2019.
  61. Between-sample relationship in learning tabular data using graph and attention networks. ICDATA Las Vegas, 2023.
  62. Adbench: Anomaly detection benchmark. Advances in Neural Information Processing Systems, 35:32142–32159, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Shourav B. Rabbani (7 papers)
  2. Ivan V. Medri (3 papers)
  3. Manar D. Samad (15 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com