Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MotherNet: A Foundational Hypernetwork for Tabular Classification (2312.08598v1)

Published 14 Dec 2023 in cs.LG

Abstract: The advent of Foundation Models is transforming machine learning across many modalities (e.g., language, images, videos) with prompt engineering replacing training in many settings. Recent work on tabular data (e.g., TabPFN) hints at a similar opportunity to build Foundation Models for classification for numerical data. In this paper, we go one step further and propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks, that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network. Like other Foundation Models, MotherNet replaces training on specific datasets with in-context learning through a single forward pass. In contrast to existing hypernetworks that were either task-specific or trained for relatively constraint multi-task settings, MotherNet is trained to generate networks to perform multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent. The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets, and is competitive with predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of transformer models like TabPFN, MotherNet generated networks are highly efficient at inference time. This methodology opens up a new approach to building predictive models on tabular data that is both efficient and robust, without any dataset-specific training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Speeding up algorithm selection using average ranking and active testing by introducing runtime. Machine learning, 107:79–108, 2018.
  2. Learning to learn by gradient descent by gradient descent. Advances in neural information processing systems, 29, 2016.
  3. Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems, 24, 2011.
  4. Learning feed-forward one-shot learners. Advances in neural information processing systems, 29, 2016.
  5. Openml benchmarking suites. arXiv preprint arXiv:1708.03731, 2017.
  6. Ranking learning algorithms: Using ibl and meta-learning on accuracy and time results. Machine Learning, 50:251–277, 2003.
  7. Breiman, L. Random forests. Machine learning, 45:5–32, 2001.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  9. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp.  785–794, 2016.
  10. Notes from the ai frontier: Insights from hundreds of use cases. McKinsey Global Institute, 2, 2018.
  11. Demšar, J. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research, 7:1–30, 2006.
  12. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126–1135. PMLR, 2017.
  13. Friedman, J. H. Greedy function approximation: a gradient boosting machine. Annals of statistics, pp.  1189–1232, 2001.
  14. Hypernetworks. 2017. URL https://openreview.net/pdf?id=rkpACe1lx.
  15. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  16. TabPFN: A transformer that solves small tabular classification problems in a second. In NeurIPS 2022 First Table Representation Workshop, 2022. URL https://openreview.net/forum?id=eu9fVjVasr4.
  17. Automated machine learning: methods, systems, challenges. Springer Nature, 2019.
  18. Tabbie: Pretrained representations of tabular data. arXiv preprint arXiv:2105.02584, 2021.
  19. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017.
  20. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  21. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  22. Transformers can do bayesian inference. arXiv preprint arXiv:2112.10510, 2021.
  23. OpenAI. Gpt-4 technical report, 2023.
  24. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  25. Optimization as a model for few-shot learning. In International conference on learning representations, 2016.
  26. Towards reproducible empirical research in meta-learning. CoRR, abs/1808.10406, 2018. URL http://arxiv.org/abs/1808.10406.
  27. Schmidhuber, J. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139, 1992.
  28. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
  29. Efficient transformers: A survey. ACM Computing Surveys, 55(6):1–28, 2022.
  30. Learning to learn: Introduction and overview. In Learning to learn, pp.  3–17. Springer, 1998.
  31. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  32. Vanschoren, J. Meta-learning: A survey. arXiv preprint arXiv:1810.03548, 2018.
  33. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  34. Sequential model-free hyperparameter tuning. In 2015 IEEE international conference on data mining, pp. 1033–1038. IEEE, 2015.
  35. A new family of power transformations to improve normality or symmetry. Biometrika, 87(4):954–959, 2000.
  36. Tabert: Pretraining for joint understanding of textual and tabular data. arXiv preprint arXiv:2005.08314, 2020.
  37. Meta-learning via hypernetworks. In 4th Workshop on Meta-Learning at NeurIPS 2020 (MetaLearn 2020). NeurIPS, 2020.
  38. Fast context adaptation via meta-learning. In International Conference on Machine Learning, pp. 7693–7702. PMLR, 2019.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com