Discovering modular solutions that generalize compositionally (2312.15001v2)
Abstract: Many complex tasks can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. It therefore seems natural to make models more modular to help capture the compositional nature of many tasks. However, it is unclear under which circumstances modular systems can discover hidden compositional structure. To shed light on this question, we study a teacher-student setting with a modular teacher where we have full control over the composition of ground truth modules. This allows us to relate the problem of compositional generalization to that of identification of the underlying modules. In particular we study modularity in hypernetworks representing a general class of multiplicative interactions. We show theoretically that identification up to linear transformation purely from demonstrations is possible without having to learn an exponential number of module combinations. We further demonstrate empirically that under the theoretically identified conditions, meta-learning from finite data can discover modular policies that generalize compositionally in a number of complex environments.
- Modular meta-learning. In Aude Billard, Anca Dragan, Jan Peters, and Jun Morimoto (eds.), Proceedings of The 2nd Conference on Robot Learning, volume 87 of Proceedings of Machine Learning Research, pp. 856–868. PMLR, October 2018. URL https://proceedings.mlr.press/v87/alet18a.html.
- Jacob Andreas. Good-Enough Compositional Data Augmentation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7556–7566, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.676. URL https://www.aclweb.org/anthology/2020.acl-main.676.
- A causal view of compositional zero-shot recognition. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1462–1473. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1010cedf85f6a7e24b087e63235dc12e-Paper.pdf.
- The DeepMind JAX Ecosystem, 2020. URL http://github.com/deepmind.
- CLOSURE: Assessing Systematic Generalization of CLEVR Models, October 2020. URL http://arxiv.org/abs/1912.05783. arXiv:1912.05783 [cs].
- Fast reinforcement learning with generalized policy updates. Proceedings of the National Academy of Sciences, 117(48):30079–30087, December 2020. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1907370117. URL https://pnas.org/doi/full/10.1073/pnas.1907370117.
- Jump to better conclusions: SCAN both left and right. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 47–55, Brussels, Belgium, 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5407. URL http://aclweb.org/anthology/W18-5407.
- Relational inductive biases, deep learning, and graph networks, October 2018. URL http://arxiv.org/abs/1806.01261. arXiv:1806.01261 [cs, stat].
- Conditional Computation in Neural Networks for faster models, January 2016. URL http://arxiv.org/abs/1511.06297. arXiv:1511.06297 [cs].
- Meta-learning with differentiable closed-form solvers. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=HyxnZh0ct7.
- Lukas Biewald. Experiment Tracking with Weights and Biases, 2020. URL https://www.wandb.com/.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 619–634, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.49. URL https://aclanthology.org/2021.emnlp-main.49.
- The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 4154–4175, Dublin, Ireland, 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.286. URL https://aclanthology.org/2022.acl-long.286.
- CNNs found to jump around more skillfully than RNNs: Compositional Generalization in Seq2seq Convolutional Networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3919–3923, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1381. URL https://aclanthology.org/P19-1381.
- Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control, August 2020. URL http://arxiv.org/abs/1910.14033. arXiv:1910.14033 [cs, stat].
- Faith and Fate: Limits of Transformers on Compositionality, June 2023. URL http://arxiv.org/abs/2305.18654. arXiv:2305.18654 [cs].
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1126–1135. PMLR, August 2017. URL https://proceedings.mlr.press/v70/finn17a.html.
- Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2):3–71, March 1988. ISSN 00100277. doi: 10.1016/0010-0277(88)90031-5. URL https://linkinghub.elsevier.com/retrieve/pii/0010027788900315.
- Learning and generalization of compositional representations of visual scenes, March 2023. URL http://arxiv.org/abs/2303.13691. arXiv:2303.13691 [cs].
- Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures, September 2021. URL http://arxiv.org/abs/2007.08970. arXiv:2007.08970 [cs].
- E. Gardner and B. Derrida. Three unfinished works on the optimal storage capacity of networks. Journal of Physics A: Mathematical and General, 22(12):1983, June 1989. doi: 10.1088/0305-4470/22/12/004. URL https://dx.doi.org/10.1088/0305-4470/22/12/004.
- Recursive Sketches for Modular Deep Learning. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2211–2220. PMLR, June 2019. URL https://proceedings.mlr.press/v97/ghazi19a.html.
- Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/cab070d53bd0d200746fb852a922064a-Paper.pdf.
- Permutation Equivariant Models for Compositional Generalization in Language. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SylVNerFvr.
- HyperNetworks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkpACe1lx.
- Robert F Hadley. Systematicity in connectionist language learning. Mind & Language, 9(3):247–272, 1994. Publisher: Wiley Online Library.
- Compositionality decomposed: how do neural networks generalise?, February 2020. URL http://arxiv.org/abs/1908.08351. arXiv:1908.08351 [cs, stat].
- Plotly Technologies Inc. Collaborative data science, 2015. URL https://plot.ly. Place: Montreal, QC Publisher: Plotly Technologies Inc.
- On The Specialization of Neural Modules. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Fh97BDaR6I.
- Multiplicative Interactions and Where to Find Them. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rylnK6VtDH.
- CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1988–1997, Honolulu, HI, July 2017. IEEE. ISBN 978-1-5386-0457-1. doi: 10.1109/CVPR.2017.215. URL https://ieeexplore.ieee.org/document/8099698/.
- Neural GPUs Learn Algorithms, March 2016. URL http://arxiv.org/abs/1511.08228. arXiv:1511.08228 [cs].
- Measuring Compositional Generalization: A Comprehensive Method on Realistic Data. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SygcCnNKwr.
- COGS: A Compositional Generalization Challenge Based on Semantic Interpretation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9087–9105, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.731. URL https://www.aclweb.org/anthology/2020.emnlp-main.731.
- Neural Network Module Decomposition and Recomposition, December 2021. URL http://arxiv.org/abs/2112.13208. arXiv:2112.13208 [cs].
- Modular Networks: Learning to Decompose Neural Computation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/file/310ce61c90f3a46e340ee8257bc70e93-Paper.pdf.
- Learning Task Grouping and Overlap in Multi-task Learning, June 2012. URL http://arxiv.org/abs/1206.6417. arXiv:1206.6417 [cs, stat].
- Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 18171–18206. PMLR, July 2023. URL https://proceedings.mlr.press/v202/lachapelle23a.html.
- Brenden M Lake. Compositional generalization through meta sequence-to-sequence learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/f4d0e2e7fc057a58f7ca4a391f01940a-Paper.pdf.
- Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks, June 2018. URL http://arxiv.org/abs/1711.00350. arXiv:1711.00350 [cs].
- Meta-Learning with Differentiable Convex Optimization, April 2019. URL http://arxiv.org/abs/1904.03758. arXiv:1904.03758 [cs].
- Compositional Generalization for Primitive Substitutions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4293–4302, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1438. URL https://aclanthology.org/D19-1438.
- Towards Out-Of-Distribution Generalization: A Survey, July 2023. URL http://arxiv.org/abs/2108.13624. arXiv:2108.13624 [cs].
- Compositional Generalization by Learning Analytical Expressions. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 11416–11427. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/83adc9225e4deb67d7ce42d58fe5157c-Paper.pdf.
- Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 108–114, Brussels, Belgium, 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5413. URL http://aclweb.org/anthology/W18-5413.
- Is a Modular Architecture Enough? In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=3-3XMModtrx.
- The role of Disentanglement in Generalisation. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=qbH974jKUVy.
- Learning Compositional Rules via Neural Program Synthesis. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 10832–10842. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/7a685d9edd95508471a9d3d6fcace432-Paper.pdf.
- A. Emin Orhan. Compositional generalization in semantic parsing with pretrained transformers, December 2022. URL http://arxiv.org/abs/2109.15101. arXiv:2109.15101 [cs].
- FiLM: Visual Reasoning with a General Conditioning Layer. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), April 2018. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v32i1.11671. URL https://ojs.aaai.org/index.php/AAAI/article/view/11671.
- Steven Phillips. Connectionism and the problem of systematicity. PhD Thesis, University of Queensland, 1995.
- Combining Modular Skills in Multitask Learning, March 2022. URL http://arxiv.org/abs/2202.13914. arXiv:2202.13914 [cs].
- Measuring and Narrowing the Compositionality Gap in Language Models, May 2023. URL http://arxiv.org/abs/2210.03350. arXiv:2210.03350 [cs].
- Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rkgMkCEtPB.
- Routing Networks: Adaptive Selection of Non-Linear Functions for Multi-Task Learning. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=ry8dvM-R-.
- A Benchmark for Systematic Generalization in Grounded Language Understanding. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 19861–19872. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/e5a90182cc81e12ab5e72d66e0b46fe3-Paper.pdf.
- On learning the past tenses of English verbs. 1986. Publisher: Cambridge, MA: MIT Press.
- Compositional generalization in a deep seq2seq model by separating syntax and semantics, May 2019. URL http://arxiv.org/abs/1904.09708. arXiv:1904.09708 [cs, stat].
- ELLA: An Efficient Lifelong Learning Algorithm. In Sanjoy Dasgupta and David McAllester (eds.), Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pp. 507–515, Atlanta, Georgia, USA, June 2013. PMLR. URL https://proceedings.mlr.press/v28/ruvolo13.html. Issue: 1.
- Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks. In D. Touretzky, M. C. Mozer, and M. Hasselmo (eds.), Advances in Neural Information Processing Systems, volume 8. MIT Press, 1995. URL https://proceedings.neurips.cc/paper_files/paper/1995/file/a1519de5b5d44b31a01de013b9b51a80-Paper.pdf.
- Statistical mechanics of learning from examples. Physical Review A, 45(8):6056–6091, April 1992. ISSN 1050-2947, 1094-1622. doi: 10.1103/PhysRevA.45.6056. URL https://link.aps.org/doi/10.1103/PhysRevA.45.6056.
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=B1ckMDqlg.
- Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 9722–9732. PMLR, July 2021. URL https://proceedings.mlr.press/v139/simsek21a.html.
- Paul Smolensky. Connectionism, Constituency and the Language of Thought. In Barry M. Loewer and Georges Rey (eds.), Meaning in Mind: Fodor and His Critics. Blackwell, 1991.
- Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=uyTL5Bvosj.
- Yuandong Tian. Student Specialization in Deep Rectified Networks With Finite Width and Input Dimension. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 9470–9480. PMLR, July 2020. URL https://proceedings.mlr.press/v119/tian20a.html.
- Training neural networks to encode symbols enables combinatorial generalization. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1791):20190309, February 2020. ISSN 0962-8436, 1471-2970. doi: 10.1098/rstb.2019.0309. URL https://royalsocietypublishing.org/doi/10.1098/rstb.2019.0309.
- Compositional Generalization from First Principles, July 2023. URL http://arxiv.org/abs/2307.05596. arXiv:2307.05596 [cs, stat].
- Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 25074–25087. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/9f9ecbf4062842df17ec3f4ea3ad7f54-Paper-Conference.pdf.
- Meta-learning via hypernetworks. In 4th Workshop on Meta-Learning at NeurIPS 2020 (MetaLearn 2020). NeurIPS, 2020.
- Toward Compositional Generalization in Object-Oriented World Modeling. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 26841–26864. PMLR, July 2022. URL https://proceedings.mlr.press/v162/zhao22b.html.
- HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning, July 2022. URL http://arxiv.org/abs/2201.04182. arXiv:2201.04182 [cs].
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=WZH7099tgfM.