Why In-Context Learning Transformers are Tabular Data Classifiers (2405.13396v1)
Abstract: The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. As synthetic data does not share features or labels with real-world data, the underlying mechanism that contributes to the success of this method remains unclear. This study provides an explanation by demonstrating that ICL-transformers acquire the ability to create complex decision boundaries during pretraining. To validate our claim, we develop a novel forest dataset generator which creates datasets that are unrealistic, but have complex decision boundaries. Our experiments confirm the effectiveness of ICL-transformers pretrained on this data. Furthermore, we create TabForestPFN, the ICL-transformer pretrained on both the original TabPFN synthetic dataset generator and our forest dataset generator. By fine-tuning this model, we reach the current state-of-the-art on tabular data classification. Code is available at https://github.com/FelixdenBreejen/TabForestPFN.
- Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6679–6687, 2021. Issue: 8.
- SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption. In International Conference on Learning Representations (ICLR). arXiv, March 2022. doi: 10.48550/arXiv.2106.15147. URL http://arxiv.org/abs/2106.15147. arXiv:2106.15147 [cs].
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
- Trompt: Towards a Better Deep Neural Network for Tabular Data. In International Conference on Machine Learning (ICML), May 2023a. URL https://openreview.net/forum?id=0yNmeyteuS. arXiv:2305.18446 [cs].
- ReConTab: Regularized Contrastive Representation Learning for Tabular Data. In NeurIPS Workshop: Table Representation Learning, 2023b.
- XGBoost: A Scalable Tree Boosting System. In International Conference on Knowledge Discovery and Data Mining (KDD), pages 785–794, August 2016. doi: 10.1145/2939672.2939785. URL http://arxiv.org/abs/1603.02754. arXiv:1603.02754 [cs].
- A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2024.
- A Survey on In-context Learning, June 2023. URL http://arxiv.org/abs/2301.00234. arXiv:2301.00234 [cs].
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations (ICLR). arXiv, June 2021. URL http://arxiv.org/abs/2010.11929. arXiv:2010.11929 [cs].
- Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks. October 2023. URL https://openreview.net/forum?id=b0OhN0ii36.
- TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks, March 2024. URL http://arxiv.org/abs/2402.11137. arXiv:2402.11137 [cs].
- Revisiting Deep Learning Models for Tabular Data. In Advances in Neural Information Processing Systems (NeurIPS). arXiv, 2021. URL http://arxiv.org/abs/2106.11959. arXiv:2106.11959 [cs] version: 3.
- On Embeddings for Numerical Features in Tabular Deep Learning. In Advances in Neural Information Processing Systems (NeurIPS). arXiv, March 2022. URL http://arxiv.org/abs/2203.05556. arXiv:2203.05556 [cs].
- TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning, July 2023. URL http://arxiv.org/abs/2307.14338. arXiv:2307.14338 [cs].
- Why do tree-based models still outperform deep learning on tabular data? In Advances in Neural Information Processing Systems (NeurIPS). arXiv, July 2022. URL http://arxiv.org/abs/2207.08815. arXiv:2207.08815 [cs, stat].
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Support vector machines. IEEE Intelligent Systems and their Applications, 13(4):18–28, July 1998. ISSN 2374-9423. doi: 10.1109/5254.708428. URL https://ieeexplore.ieee.org/document/708428. Conference Name: IEEE Intelligent Systems and their Applications.
- TabLLM: Few-shot Classification of Tabular Data with Large Language Models. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 5549–5581. PMLR, April 2023. URL https://proceedings.mlr.press/v206/hegselmann23a.html. ISSN: 2640-3498.
- Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems (NeurIPS). arXiv, December 2020. URL http://arxiv.org/abs/2006.11239. arXiv:2006.11239 [cs, stat].
- TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. In International Conference on Learning Representations (ICLR). arXiv, September 2023. doi: 10.48550/arXiv.2207.01848. URL http://arxiv.org/abs/2207.01848. arXiv:2207.01848 [cs, stat].
- Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678, 2020.
- Well-tuned Simple Nets Excel on Tabular Datasets, November 2021. URL http://arxiv.org/abs/2106.11189. arXiv:2106.11189 [cs].
- Net-DNF: Effective Deep Modeling of Tabular Data. In International Conference on Learning Representations (ICLR), page 16, 2021.
- LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html.
- Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chemical Reviews, 121(16):9816–9872, August 2021. ISSN 0009-2665, 1520-6890. doi: 10.1021/acs.chemrev.1c00107. URL https://pubs.acs.org/doi/10.1021/acs.chemrev.1c00107.
- Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 28742–28756. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/f1507aba9fc82ffa7cc7373c58f8a613-Abstract.html.
- TRANSFER LEARNING WITH DEEP TABULAR MODELS. In International Conference on Learning Representations (ICLR), 2023.
- Machine Learning in Agriculture: A Review. Sensors, 18(8):2674, August 2018. ISSN 1424-8220. doi: 10.3390/s18082674. URL https://www.mdpi.com/1424-8220/18/8/2674. Number: 8 Publisher: Multidisciplinary Digital Publishing Institute.
- Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms. Mathematics, 8(5):765, May 2020. ISSN 2227-7390. doi: 10.3390/math8050765. URL https://www.mdpi.com/2227-7390/8/5/765.
- In-Context Data Distillation with TabPFN. In NeurIPS Workshop: Table Representation Learning. arXiv, 2023. doi: 10.48550/arXiv.2402.06971. URL http://arxiv.org/abs/2402.06971. arXiv:2402.06971 [cs] version: 1.
- Calvin McCarter. What exactly has TabPFN learned to do?, 2024. URL https://iclr-blogposts.github.io/2024/blog/what-exactly-has-tabpfn-learned-to-do/.
- When Do Neural Nets Outperform Boosted Trees on Tabular Data? In Advances in Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks. arXiv, October 2023. URL http://arxiv.org/abs/2305.02997. arXiv:2305.02997 [cs, stat].
- STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables. In International Conference on Learning Representations (ICLR). arXiv, March 2023. URL http://arxiv.org/abs/2303.00918. arXiv:2303.00918 [cs].
- XGBoost Model for Chronic Kidney Disease Diagnosis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(6):2131–2140, November 2020. ISSN 1545-5963, 1557-9964, 2374-0043. doi: 10.1109/TCBB.2019.2911071. URL https://ieeexplore.ieee.org/document/8693581/.
- Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys, 54(2):1–38, March 2022. ISSN 0360-0300, 1557-7341. doi: 10.1145/3439950. URL https://dl.acm.org/doi/10.1145/3439950.
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85):2825–2830, 2011. ISSN 1533-7928. URL http://jmlr.org/papers/v12/pedregosa11a.html.
- CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/hash/14491b756b3a51daac41c24863285549-Abstract.html. arXiv:1706.09516 [cs].
- Predicting clicks: estimating the click-through rate for new ads. In International Conference on World Wide Web (WWW), pages 521–530, Banff Alberta Canada, May 2007. ACM. ISBN 978-1-59593-654-7. doi: 10.1145/1242572.1242643. URL https://dl.acm.org/doi/10.1145/1242572.1242643.
- High dimensional, tabular deep learning with an auxiliary knowledge graph. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Interpretable Machine Learning for TabPFN, March 2024. URL http://arxiv.org/abs/2403.10923. arXiv:2403.10923 [cs, stat].
- The Pitfalls of Simplicity Bias in Neural Networks. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 9573–9585. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/6cfe0e6127fa25df2a0ef2ae1067d915-Abstract.html.
- Regularization Learning Networks: Deep Learning for Tabular Datasets. In Advances in Neural Information Processing Systems (NeurIPS), volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/hash/500e75a036dc2d7d2fec5da1b71d36cc-Abstract.html.
- Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, May 2022. ISSN 1566-2535. doi: 10.1016/j.inffus.2021.11.011. URL https://www.sciencedirect.com/science/article/pii/S1566253521002360.
- SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training. In NeurIPS Workshop: Table Representation Learning. arXiv, June 2021. URL http://arxiv.org/abs/2106.01342. arXiv:2106.01342 [cs, stat].
- Self-supervised Representation Learning from Random Data Projectors. In NeurIPS Workshop: Table Representation Learning, 2023.
- SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 18853–18865. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/9c8661befae6dbcd08304dbf4dcaf0db-Abstract.html.
- OpenML: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2):49–60, June 2014. ISSN 1931-0145, 1931-0153. doi: 10.1145/2641190.2641198. URL https://dl.acm.org/doi/10.1145/2641190.2641198.
- Attention is All you Need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 11033–11043. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/7d97667a3e056acab9aaf653807b4a03-Abstract.html.
- Tabular Data: Is Attention All You Need? In International Conference on Learning Representations (ICLR). arXiv, February 2024. URL http://arxiv.org/abs/2402.03970. arXiv:2402.03970 [cs].
- Towards Foundation Models for Learning on Tabular Data, October 2023. URL http://arxiv.org/abs/2310.07338. arXiv:2310.07338 [cs].
- Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Computing Surveys, 52(1):1–38, January 2020. ISSN 0360-0300, 1557-7341. doi: 10.1145/3285029. URL https://dl.acm.org/doi/10.1145/3285029.
- Unlocking the Transferability of Tokens in Deep Models for Tabular Data. In NeurIPS Workshop: Table Representation Learning, 2023.
- XTab: Cross-table Pretraining for Tabular Transformers. In International Conference on Learning Representations (ICLR), June 2023. URL https://openreview.net/forum?id=uGORNDmIdr.