Good-Enough Compositional Data Augmentation

Published 21 Apr 2019 in cs.CL | (1904.09545v4)

Abstract: We propose a simple data augmentation protocol aimed at providing a compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real training examples and replacing (possibly discontinuous) fragments with other fragments that appear in at least one similar environment. The protocol is model-agnostic and useful for a variety of tasks. Applied to neural sequence-to-sequence models, it reduces error rate by as much as 87% on diagnostic tasks from the SCAN dataset and 16% on a semantic parsing task. Applied to n-gram LLMs, it reduces perplexity by roughly 1% on small corpora in several languages.

Abstract PDF Upgrade to Chat

Authors (1)

Jacob Andreas

Citations (226)

View on Semantic Scholar

Summary

The paper outlines the detailed submission guidelines for ACL 2020, emphasizing uniform formatting and structure in manuscripts.
It specifies file formats, page limits, and a double-blind review process to promote fair, unbiased evaluations.
The analysis highlights how standardized instructions enhance readability, reproducibility, and overall research quality.

An Analysis of ACL 2020 Proceedings Preparation Instructions

The document under consideration provides comprehensive guidelines for authors preparing manuscripts for submission to the ACL 2020 conference. The paper offers critical insights into the structural and formatting requirements necessary for submission, aiming to streamline the review and publication process for both authors and the conference organizers.

Overview of Instructions

The paper meticulously outlines the standardized formatting procedures mandatory for all submissions to the ACL 2020 conference. These include the prescribed document structure, submission formats, style files, and font usage, among other essential considerations. The adherence to these specifications ensures consistency across manuscripts, thereby facilitating a uniform reading experience for reviewers and attendees.

Key Specifications

File Format and Length: The instructions specify that submissions must be presented in PDF format, with particular attention to include all necessary fonts. Long papers are permitted up to eight pages of content, with an additional page allowed following acceptance to incorporate reviewer feedback. Short papers can span up to four pages, gaining an additional page post-acceptance. References can span unlimited pages, promoting comprehensive citations.
Anonymity and Reviewing: The paper enforces a double-blind review process, which requires the exclusion of author-identifying information in submissions. This measure is designed to eliminate bias during the evaluation stage. Upon acceptance, proper attribution with author details is reinstated for the camera-ready copies.
Formatting Details: Manuscripts must adhere to a two-column layout with specified margin and spacing dimensions. The usage of Adobe's Times Roman or a similar font is recommended to maintain visual consistency. Formatting for section titles, footnotes, and references is explicitly defined to enforce uniformity.
Supplementary Materials and Appendices: The conference encourages the submission of additional materials, such as appendices and supplementary datasets, to support the reproducibility of results. These materials should complement, rather than replace, the core manuscript content, ensuring that essential insights are contained within the main document.

Implications for Research Publication

The detailed specifications in this paper have significant implications for the publication process at ACL 2020. By enforcing a uniform set of guidelines, the conference aims to enhance the accessibility and readability of submissions. This systematic approach can be seen as an effort to elevate the quality and impact of the shared research, which is essential for advancing the state of knowledge in computational linguistics.

Future Considerations

As the field progresses, these instructions may evolve to incorporate new technological advancements and publication practices. Future iterations could potentially address emerging issues like open-access considerations, data sharing protocols, and enhanced digital interactivity for conference proceedings. Overall, such guidelines are crucial in maintaining a high standard of scholarly communication, which underpins the collaborative and iterative nature of academic research.

Markdown Report Issue