Improving Compositional Generalization with Latent Structure and Data Augmentation (2112.07610v2)

Published 14 Dec 2021 in cs.CL

Abstract: Generic unstructured neural networks have been shown to struggle on out-of-distribution compositional generalization. Compositional data augmentation via example recombination has transferred some prior knowledge about compositionality to such black-box neural models for several semantic parsing tasks, but this often required task-specific engineering or provided limited gains. We present a more powerful data recombination method using a model called Compositional Structure Learner (CSL). CSL is a generative model with a quasi-synchronous context-free grammar backbone, which we induce from the training data. We sample recombined examples from CSL and add them to the fine-tuning data of a pre-trained sequence-to-sequence model (T5). This procedure effectively transfers most of CSL's compositional bias to T5 for diagnostic tasks, and results in a model even stronger than a T5-CSL ensemble on two real world compositional generalization tasks. This results in new state-of-the-art performance for these challenging semantic parsing tasks requiring generalization to both natural language variation and novel compositions of elements.

Citations (53)

View on Semantic Scholar

Summary

The paper presents the CSL model, a novel generative approach based on a quasi-synchronous context-free grammar to induce latent compositional structures from training data.
The paper employs data recombination to augment a pre-trained T5 model, thereby transferring compositional insights for improved generalization on semantic parsing tasks.
The paper demonstrates that this methodology surpasses ensemble models by achieving state-of-the-art results on diagnostic tasks that require handling natural language variations.

The paper "Improving Compositional Generalization with Latent Structure and Data Augmentation" addresses the challenge of out-of-distribution compositional generalization, where traditional unstructured neural networks often fail. To improve this, the authors propose a novel data recombination method leveraging a model they call the Compositional Structure Learner (CSL).

Key Contributions:

Compositional Structure Learner (CSL): CSL is a generative model founded on a quasi-synchronous context-free grammar, designed to induce latent compositional structures from the training data. This model inherently possesses a bias towards compositional generalization, alleviating some of the limitations faced by traditional black-box neural models.
Data Recombination: By sampling recombined examples from the CSL, the authors augment the fine-tuning data of a pre-trained sequence-to-sequence model, specifically T5. This recombination process enables the transfer of CSL's compositional insights to the T5 model, thereby enhancing its generalization capabilities.
Improved Performance: Through extensive experimentation, the authors demonstrate that incorporating the recombined data from CSL improves the performance of T5 on diagnostic tasks. Notably, this approach outperforms even an ensemble of T5 and CSL models, establishing new state-of-the-art results on several challenging semantic parsing tasks which require generalization to both natural language variations and novel compositions.

Implications:

The approach outlined by the authors represents a significant advance in the field of natural language processing, particularly for tasks that demand rigorous generalization to unseen data compositions. By leveraging latent structural learning and strategic data augmentation, the model is capable of better handling the complexities of natural language and achieving superior performance in real-world applications. This work not only addresses some of the shortcomings of existing neural network architectures but also provides a scalable methodology for enhancing compositional generalization across a variety of semantic parsing tasks.