Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Why In-Context Learning Transformers are Tabular Data Classifiers (2405.13396v1)

Published 22 May 2024 in cs.LG and stat.ML

Abstract: The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. As synthetic data does not share features or labels with real-world data, the underlying mechanism that contributes to the success of this method remains unclear. This study provides an explanation by demonstrating that ICL-transformers acquire the ability to create complex decision boundaries during pretraining. To validate our claim, we develop a novel forest dataset generator which creates datasets that are unrealistic, but have complex decision boundaries. Our experiments confirm the effectiveness of ICL-transformers pretrained on this data. Furthermore, we create TabForestPFN, the ICL-transformer pretrained on both the original TabPFN synthetic dataset generator and our forest dataset generator. By fine-tuning this model, we reach the current state-of-the-art on tabular data classification. Code is available at https://github.com/FelixdenBreejen/TabForestPFN.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6679–6687, 2021. Issue: 8.
  2. SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption. In International Conference on Learning Representations (ICLR). arXiv, March 2022. doi: 10.48550/arXiv.2106.15147. URL http://arxiv.org/abs/2106.15147. arXiv:2106.15147 [cs].
  3. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
  4. Trompt: Towards a Better Deep Neural Network for Tabular Data. In International Conference on Machine Learning (ICML), May 2023a. URL https://openreview.net/forum?id=0yNmeyteuS. arXiv:2305.18446 [cs].
  5. ReConTab: Regularized Contrastive Representation Learning for Tabular Data. In NeurIPS Workshop: Table Representation Learning, 2023b.
  6. XGBoost: A Scalable Tree Boosting System. In International Conference on Knowledge Discovery and Data Mining (KDD), pages 785–794, August 2016. doi: 10.1145/2939672.2939785. URL http://arxiv.org/abs/1603.02754. arXiv:1603.02754 [cs].
  7. A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2024.
  8. A Survey on In-context Learning, June 2023. URL http://arxiv.org/abs/2301.00234. arXiv:2301.00234 [cs].
  9. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations (ICLR). arXiv, June 2021. URL http://arxiv.org/abs/2010.11929. arXiv:2010.11929 [cs].
  10. Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks. October 2023. URL https://openreview.net/forum?id=b0OhN0ii36.
  11. TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks, March 2024. URL http://arxiv.org/abs/2402.11137. arXiv:2402.11137 [cs].
  12. Revisiting Deep Learning Models for Tabular Data. In Advances in Neural Information Processing Systems (NeurIPS). arXiv, 2021. URL http://arxiv.org/abs/2106.11959. arXiv:2106.11959 [cs] version: 3.
  13. On Embeddings for Numerical Features in Tabular Deep Learning. In Advances in Neural Information Processing Systems (NeurIPS). arXiv, March 2022. URL http://arxiv.org/abs/2203.05556. arXiv:2203.05556 [cs].
  14. TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning, July 2023. URL http://arxiv.org/abs/2307.14338. arXiv:2307.14338 [cs].
  15. Why do tree-based models still outperform deep learning on tabular data? In Advances in Neural Information Processing Systems (NeurIPS). arXiv, July 2022. URL http://arxiv.org/abs/2207.08815. arXiv:2207.08815 [cs, stat].
  16. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  17. Support vector machines. IEEE Intelligent Systems and their Applications, 13(4):18–28, July 1998. ISSN 2374-9423. doi: 10.1109/5254.708428. URL https://ieeexplore.ieee.org/document/708428. Conference Name: IEEE Intelligent Systems and their Applications.
  18. TabLLM: Few-shot Classification of Tabular Data with Large Language Models. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 5549–5581. PMLR, April 2023. URL https://proceedings.mlr.press/v206/hegselmann23a.html. ISSN: 2640-3498.
  19. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems (NeurIPS). arXiv, December 2020. URL http://arxiv.org/abs/2006.11239. arXiv:2006.11239 [cs, stat].
  20. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. In International Conference on Learning Representations (ICLR). arXiv, September 2023. doi: 10.48550/arXiv.2207.01848. URL http://arxiv.org/abs/2207.01848. arXiv:2207.01848 [cs, stat].
  21. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678, 2020.
  22. Well-tuned Simple Nets Excel on Tabular Datasets, November 2021. URL http://arxiv.org/abs/2106.11189. arXiv:2106.11189 [cs].
  23. Net-DNF: Effective Deep Modeling of Tabular Data. In International Conference on Learning Representations (ICLR), page 16, 2021.
  24. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html.
  25. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chemical Reviews, 121(16):9816–9872, August 2021. ISSN 0009-2665, 1520-6890. doi: 10.1021/acs.chemrev.1c00107. URL https://pubs.acs.org/doi/10.1021/acs.chemrev.1c00107.
  26. Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 28742–28756. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/f1507aba9fc82ffa7cc7373c58f8a613-Abstract.html.
  27. TRANSFER LEARNING WITH DEEP TABULAR MODELS. In International Conference on Learning Representations (ICLR), 2023.
  28. Machine Learning in Agriculture: A Review. Sensors, 18(8):2674, August 2018. ISSN 1424-8220. doi: 10.3390/s18082674. URL https://www.mdpi.com/1424-8220/18/8/2674. Number: 8 Publisher: Multidisciplinary Digital Publishing Institute.
  29. Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms. Mathematics, 8(5):765, May 2020. ISSN 2227-7390. doi: 10.3390/math8050765. URL https://www.mdpi.com/2227-7390/8/5/765.
  30. In-Context Data Distillation with TabPFN. In NeurIPS Workshop: Table Representation Learning. arXiv, 2023. doi: 10.48550/arXiv.2402.06971. URL http://arxiv.org/abs/2402.06971. arXiv:2402.06971 [cs] version: 1.
  31. Calvin McCarter. What exactly has TabPFN learned to do?, 2024. URL https://iclr-blogposts.github.io/2024/blog/what-exactly-has-tabpfn-learned-to-do/.
  32. When Do Neural Nets Outperform Boosted Trees on Tabular Data? In Advances in Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks. arXiv, October 2023. URL http://arxiv.org/abs/2305.02997. arXiv:2305.02997 [cs, stat].
  33. STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables. In International Conference on Learning Representations (ICLR). arXiv, March 2023. URL http://arxiv.org/abs/2303.00918. arXiv:2303.00918 [cs].
  34. XGBoost Model for Chronic Kidney Disease Diagnosis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(6):2131–2140, November 2020. ISSN 1545-5963, 1557-9964, 2374-0043. doi: 10.1109/TCBB.2019.2911071. URL https://ieeexplore.ieee.org/document/8693581/.
  35. Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys, 54(2):1–38, March 2022. ISSN 0360-0300, 1557-7341. doi: 10.1145/3439950. URL https://dl.acm.org/doi/10.1145/3439950.
  36. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85):2825–2830, 2011. ISSN 1533-7928. URL http://jmlr.org/papers/v12/pedregosa11a.html.
  37. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/hash/14491b756b3a51daac41c24863285549-Abstract.html. arXiv:1706.09516 [cs].
  38. Predicting clicks: estimating the click-through rate for new ads. In International Conference on World Wide Web (WWW), pages 521–530, Banff Alberta Canada, May 2007. ACM. ISBN 978-1-59593-654-7. doi: 10.1145/1242572.1242643. URL https://dl.acm.org/doi/10.1145/1242572.1242643.
  39. High dimensional, tabular deep learning with an auxiliary knowledge graph. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  40. Interpretable Machine Learning for TabPFN, March 2024. URL http://arxiv.org/abs/2403.10923. arXiv:2403.10923 [cs, stat].
  41. The Pitfalls of Simplicity Bias in Neural Networks. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 9573–9585. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/6cfe0e6127fa25df2a0ef2ae1067d915-Abstract.html.
  42. Regularization Learning Networks: Deep Learning for Tabular Datasets. In Advances in Neural Information Processing Systems (NeurIPS), volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/hash/500e75a036dc2d7d2fec5da1b71d36cc-Abstract.html.
  43. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, May 2022. ISSN 1566-2535. doi: 10.1016/j.inffus.2021.11.011. URL https://www.sciencedirect.com/science/article/pii/S1566253521002360.
  44. SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training. In NeurIPS Workshop: Table Representation Learning. arXiv, June 2021. URL http://arxiv.org/abs/2106.01342. arXiv:2106.01342 [cs, stat].
  45. Self-supervised Representation Learning from Random Data Projectors. In NeurIPS Workshop: Table Representation Learning, 2023.
  46. SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 18853–18865. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/9c8661befae6dbcd08304dbf4dcaf0db-Abstract.html.
  47. OpenML: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2):49–60, June 2014. ISSN 1931-0145, 1931-0153. doi: 10.1145/2641190.2641198. URL https://dl.acm.org/doi/10.1145/2641190.2641198.
  48. Attention is All you Need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  49. VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 11033–11043. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/7d97667a3e056acab9aaf653807b4a03-Abstract.html.
  50. Tabular Data: Is Attention All You Need? In International Conference on Learning Representations (ICLR). arXiv, February 2024. URL http://arxiv.org/abs/2402.03970. arXiv:2402.03970 [cs].
  51. Towards Foundation Models for Learning on Tabular Data, October 2023. URL http://arxiv.org/abs/2310.07338. arXiv:2310.07338 [cs].
  52. Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Computing Surveys, 52(1):1–38, January 2020. ISSN 0360-0300, 1557-7341. doi: 10.1145/3285029. URL https://dl.acm.org/doi/10.1145/3285029.
  53. Unlocking the Transferability of Tokens in Deep Models for Tabular Data. In NeurIPS Workshop: Table Representation Learning, 2023.
  54. XTab: Cross-table Pretraining for Tabular Transformers. In International Conference on Learning Representations (ICLR), June 2023. URL https://openreview.net/forum?id=uGORNDmIdr.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com