DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts (2105.03023v2)

Published 7 May 2021 in cs.CL

Abstract: Despite recent advances in natural language generation, it remains challenging to control attributes of generated text. We propose DExperts: Decoding-time Experts, a decoding-time method for controlled text generation that combines a pretrained LLM with "expert" LMs and/or "anti-expert" LMs in a product of experts. Intuitively, under the ensemble, tokens only get high probability if they are considered likely by the experts, and unlikely by the anti-experts. We apply DExperts to language detoxification and sentiment-controlled generation, where we outperform existing controllable generation methods on both automatic and human evaluations. Moreover, because DExperts operates only on the output of the pretrained LM, it is effective with (anti-)experts of smaller size, including when operating on GPT-3. Our work highlights the promise of tuning small LMs on text with (un)desirable attributes for efficient decoding-time steering.

Authors (7)

Alisa Liu (25 papers)
Maarten Sap (87 papers)
Ximing Lu (52 papers)
Swabha Swayamdipta (49 papers)
Chandra Bhagavatula (46 papers)
Noah A. Smith (224 papers)
Yejin Choi (287 papers)

Citations (319)

View on Semantic Scholar

Summary

The paper proposes a novel decoding-time method that integrates a base pretrained language model with fine-tuned expert and anti-expert models to control text attributes.
It demonstrates significant improvements in detoxification and sentiment control while preserving the fluency and diversity of generated outputs.
The approach reduces computational demands by leveraging smaller models for biasing, paving the way for safer, scalable, and more controlled language generation.

Controlled Text Generation with DExperts

The paper presents a novel decoding-time strategy for controlling attributes in text generation, known as DExperts (Decoding-time Experts), which combines pretrained LLMs (LMs) with smaller "expert" and/or "anti-expert" LMs. These experts and anti-experts are finely tuned LMs which emphasize desirable and undesirable attributes in text, respectively. DExperts operate by exploiting a product-of-experts mechanism at decoding time, allowing it to align token probabilities in accordance to both the experts' and anti-experts' assessments, thereby steering generated text towards certain desired characteristics such as detoxification or sentiment polarity.

Methodology

DExperts effectively integrate pretrained LMs with additional smaller models that contribute additional biases towards (or away from) specified attributes. The key innovation here is the use of these tunable small LMs in combination with the output logits of a larger base LM to alter the probabilities of forthcoming words during generation. The formulation is straightforward; it amplifies the probabilistic predictions of tokens favored by the expert model while suppressing those preferred by the anti-expert, straight from the decoding-time logic to avoid fine-tuning the colossal base models.

The experimental procedures focus on two applications: (1) reducing toxicity and (2) achieving controlled sentiment generation. These tasks capitalize on initially fine-tuning smaller LMs on data exemplifying the target and non-target attributes (e.g., toxic versus non-toxic, positive versus negative sentiment).

Key Results

The DExperts model is shown to significantly outperform other established methods in both tasks, maximizing control over generation attributes without sacrificing diversity or fluency in output. In language detoxification, DExperts demonstrates its effectiveness across multiple model sizes with less reliance on large datasets for training its antitoxic experts. In the sentiment-control experiment, it adeptly manipulates sentiment even in adversarial setups, further proving the robustness of this technology.

In all scenarios tested, DExperts maintains higher fluency and produces less toxic outputs compared to standalone pretrained models or competitive adversarial methodologies such as GeDi or PPLM. Its utility is underscored especially in regard to its operational efficiency and practicality given the computational costs associated with retraining or finetuning expansive LMs.

Implications and Future Directions

The paper highlights DExperts' potential in making advanced LLMs safer and more applicable across ethical and social dimensions. The technique described provides an accessible framework for researchers and developers seeking to incorporate nuanced control over text generation with limited resources. By delegating attribute control to smaller LMs and operating at decoding time, DExperts stands robust against the quickly compounding computational restrictions that accompany the scaling of modern LMs.

Furthermore, the versatility of combining multiple experts and anti-experts in a single ensemble opens a promising avenue for multi-faceted text modification — a key aspect for fields like content personalization, automatic moderation, and beyond. Looking forward, exploration into integrating DExperts with diverse emerging initiatives such as reinforcement learning could compound the advantages of this approach. Additionally, it serves as a valuable case paper in balancing model transparency with control capabilities, emphasizing ethical adherence in automated language technologies.

In conclusion, DExperts epitomizes a pragmatic advance in the controlled application of LLMs, suggesting a scalable pathway forward not merely for their refinement but also for their disciplined deployment in sensitive and mission-critical domains.

PDF Markdown

Related Papers

Tweets

https://twitter.com/chrmanning/status/1748203123884409185

https://twitter.com/PMinervini/status/1843389856472609153