Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation

Published 24 Apr 2023 in cs.CL | (2304.11791v1)

Abstract: Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 scores on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (6)

View on Semantic Scholar

Summary

The paper proposes PreDAT, which uses a directed acyclic architecture to overcome NAR pre-training deficiencies and achieve superior text generation quality.
It introduces the Double-Source Text Infilling task to improve bidirectional dependencies and reduce token prediction errors in NAR models.
Empirical results show an average 4.2 improvement in n-gram scores and a 17x throughput increase over conventional pre-trained models.

Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation

The research presented in this paper introduces a novel approach to optimizing Non-Autoregressive (NAR) text generation through the development of the Pre-trained Directed Acyclic Transformer (PreDAT). The primary aim is to address the deficiencies in pre-training for NAR models, which have historically lagged behind pre-trained autoregressive (AR) models in more diverse text generation tasks. The paper introduces a new pre-training task and demonstrates significant performance improvements over existing models.

The authors propose a model architecture called the Directed Acyclic Transformer (DAT), which leverages a directed acyclic graph to enhance NAR generation. This approach is designed to reduce errors in token prediction by incorporating an unbiased prediction order, enabling the model to generate text in a more consistent and streamlined manner. PreDAT is trained using a Double-Source Text Infilling (DSTI) task, which is designed to improve prediction consistency and mitigate the common multi-modality problem in NAR models.

Quantitative results from the experiments conducted on five different text generation tasks are compelling. PreDAT outperforms existing pre-trained NAR models by achieving an average score improvement of 4.2 on standardized n-gram-based metrics and surpasses pre-trained AR baselines, showing an average improvement of 0.7 scores while facilitating a 17x increase in throughput. These findings indicate that PreDAT not only accelerates the generation process but also enhances overall text quality, effectively tackling the error accumulation issues seen in autoregressive models.

From a methodological standpoint, the DSTI task allows PreDAT to process simultaneously predicted sentence fragments, significantly promoting bidirectional dependencies in text generation. In practical terms, this implies that PreDAT could be well-suited for applications demanding both high-speed and high-quality text generation.

Theoretical implications of this research suggest a shift in how pre-training for NAR models can be approached, with potential applications extending to real-time text generation where latency and throughput are critical. Additionally, PreDAT’s capacity to lessen error accumulation and enhance input relevance offers new avenues for further exploration in both machine translation and other text-intensive AI applications.

Future research directions inspired by this work could explore further enhancements to DSTI and the integration of more complex alignment-based objectives. There is also potential in expanding the framework to incorporate more nuanced understanding and prediction tasks that could benefit domains like dialogue systems and content creation where textual coherence and accuracy are paramount.

In summary, this paper signifies an important step in advancing NAR text generation by introducing a robust pre-training framework that bridges the gap between speed and accuracy, providing both a theoretical and practical foundation for subsequent advancements in AI text generation methodologies.

Markdown Report Issue