BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Published 29 Oct 2019 in cs.CL, cs.LG, and stat.ML | (1910.13461v1)

Abstract: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also report ablation experiments that replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance.

Abstract PDF Upgrade to Chat

Authors (8)

Citations (9,782)

View on Semantic Scholar

Summary

The paper introduces a novel denoising sequence-to-sequence framework that enhances performance in diverse NLP tasks.
It integrates features of both autoregressive and autoencoding models to better capture contextual dependencies.
The approach demonstrates robust results across multiple benchmarks, highlighting its effective application in language processing.

Overview of the ACL 2019 Formatting Instructions

The paper "Instructions for ACL 2019 Proceedings" serves as a comprehensive guide for authors preparing submissions for the Association for Computational Linguistics (ACL) 2019 conference. This document outlines the precise formatting requirements for both review submissions and final camera-ready versions of papers.

Introduction and Purpose

The instructions aim to maintain uniformity and readability across all submissions, ensuring that all papers conform to a standard format. This is crucial for both initial submissions, which undergo a double-blind peer review process, and for final camera-ready versions, which will be compiled into the conference proceedings.

Document Structure

The paper is divided into several sections, each addressing specific formatting and submission guidelines:

General Instructions: Specifies manuscript structure, page layout, and format details. It emphasizes the need for a two-column layout, appropriate margins, and other layout particulars.
Submission Requirements: Authors are required to submit their papers in PDF format. The document highlights the importance of embedding all necessary fonts within the PDF to ensure correct rendering of the document across different platforms.
Ruler and References: The paper details the use of a printed ruler for the review copy to aid reviewers in their feedback. It also provides extensive guidance on citation styles and formatting, compatible with BibTeX.

Detailed Formatting Guidelines

The document offers meticulous details on various aspects of paper formatting:

Page Layout: Exact dimensions for margins, column widths, and spacing are provided to ensure uniformity.
Font Usage: Recommends the use of Adobe’s Times Roman font for consistency, with fallback options noted.
Title and Section Headings: Specifications include font sizes and styles for different sections, ensuring that titles and author information are prominently displayed.

Practical Instructions

The paper includes practical instructions for preparing the manuscript electronically:

PDF Production: Important considerations when producing the PDF version of the manuscript, including font embedding and ensuring print quality.
Graphics and Tables: Guidelines on placing and captioning graphics, tables, and figures to enhance the paper's visual clarity.

Review and Submission Process

The instructions explicitly address the double-blind review process, advising authors to anonymize their submissions by removing all identifying information. Authors are also instructed to leave sufficient space for names and affiliations to be added in the camera-ready versions.

Supplementary Material

The paper encourages the submission of supplementary materials such as code, data, and additional proofs, while emphasizing that these materials should be clearly distinct from the main body of the paper.

Implications and Future Directions

While the main purpose of the paper is to provide formatting guidelines, its implications are significant for maintaining high standards in the documentation and dissemination of research within the ACL community. Adherence to these guidelines ensures that the final proceedings are professional and accessible, facilitating better comprehension and furthering research dissemination.

Future developments in this area may include more automated tools for verifying compliance with formatting requirements, possibly incorporating machine learning to assist in the review process. Additionally, as online and interactive content becomes more prevalent, future iterations of the guidelines may include instructions for multimedia and other non-traditional forms of research output.

Conclusion

The document is crucial for authors aiming to contribute to ACL 2019, providing exhaustive instructions that cover every aspect of the manuscript preparation process. By adhering to these guidelines, authors ensure their submissions meet the high standards expected by the conference, ultimately contributing to the field's collective knowledge and advancement.

Markdown Report Issue