Semi-Autoregressive Training Improves Mask-Predict Decoding (2001.08785v1)

Published 23 Jan 2020 in cs.CL, cs.LG, and stat.ML

Abstract: The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach. We introduce a new training method for conditional masked LLMs, SMART, which mimics the semi-autoregressive behavior of mask-predict, producing training examples that contain model predictions as part of their inputs. Models trained with SMART produce higher-quality translations when using mask-predict decoding, effectively closing the remaining performance gap with fully autoregressive models.

Authors (3)

Marjan Ghazvininejad (33 papers)
Omer Levy (70 papers)
Luke Zettlemoyer (225 papers)

Citations (71)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Semi-Autoregressive Training Improves Mask-Predict Decoding (2001.08785v1)

Summary

Related Papers