Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
104 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Controllable Text Generation in the Instruction-Tuning Era (2405.01490v1)

Published 2 May 2024 in cs.CL and cs.AI

Abstract: While most research on controllable text generation has focused on steering base LLMs, the emerging instruction-tuning and prompting paradigm offers an alternate approach to controllability. We compile and release ConGenBench, a testbed of 17 different controllable generation tasks, using a subset of it to benchmark the performance of 9 different baselines and methods on Instruction-tuned LLMs. To our surprise, we find that prompting-based approaches outperform controllable text generation methods on most datasets and tasks, highlighting a need for research on controllable text generation with Instruction-tuned LLMs in specific. Prompt-based approaches match human performance on most stylistic tasks while lagging on structural tasks, foregrounding a need to study more varied constraints and more challenging stylistic tasks. To facilitate such research, we provide an algorithm that uses only a task dataset and a LLM with in-context capabilities to automatically generate a constraint dataset. This method eliminates the fields dependence on pre-curated constraint datasets, hence vastly expanding the range of constraints that can be studied in the future.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Jigsaw unintended bias in toxicity classification, 2019. URL https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.
  3. Falcon-40b: an open large language model with state-of-the-art performance. Findings of the Association for Computational Linguistics: ACL, 2023:10755–10773, 2023.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Chatgpt vs gemini vs llama on multilingual sentiment analysis. arXiv preprint arXiv:2402.01715, 2024.
  6. Defending against alignment-breaking attacks via robustly aligned llm. arXiv preprint arXiv:2309.14348, 2023.
  7. Cognac: Controllable text generation with language constraints. In arXiv, 2022.
  8. Incorporating structured commonsense knowledge in story completion. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  6244–6251, 2019.
  9. KR1442 Chowdhary and KR Chowdhary. Natural language processing. Fundamentals of artificial intelligence, pp.  603–649, 2020.
  10. A beginner’s guide and best practices for using crowdsourcing platforms for survey research: The case of amazon mechanical turk (mturk). Journal of Global Business Insights, 6(1):92–97, 2021.
  11. Plug and play language models: A simple approach to controlled text generation. 2020.
  12. Oxford English Dictionary. Oxford english dictionary. Simpson, Ja & Weiner, Esc, 3, 1989.
  13. Hierarchical neural story generation. arXiv preprint arXiv:1805.04833, 2018.
  14. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462, 2020.
  15. Openwebtext corpus. http://Skylion007.github.io/OpenWebTextCorpus, 2019.
  16. Improving controllable text generation with position-aware weighted decoding. In Findings of the Association for Computational Linguistics: ACL 2022, pp.  3449–3467, 2022.
  17. Teaching machines to read and comprehend. In NIPS, pp.  1693–1701, 2015. URL http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend.
  18. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
  19. A survey of nlp-related crowdsourcing hits: what works and what does not. arXiv preprint arXiv:2111.05241, 2021.
  20. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  21. Karen Sparck Jones. Natural language processing: a historical review. Current issues in computational linguistics: in honour of Don Walker, pp.  3–16, 1994.
  22. Grace: Discriminator-guided chain-of-thought reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pp.  15299–15328, 2023.
  23. Critic-guided decoding for controlled text generation. arXiv preprint arXiv:2212.10938, 2022.
  24. Gedi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pp.  4929–4952, 2021.
  25. Gradient-based constrained sampling from language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  2251–2277, 2022.
  26. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  27. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  28. Dexperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  6691–6706, 2021.
  29. Bolt: Fast energy-based controlled text generation with tunable biases. arXiv preprint arXiv:2305.12018, 2023.
  30. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  31. Nltk: The natural language toolkit. arXiv preprint cs/0205028, 2002.
  32. Neurologic a* esque decoding: Constrained text generation with lookahead heuristics. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  780–799, 2022.
  33. Inference-time policy adapters (ipa): Tailoring extreme-scale lms without fine-tuning. arXiv preprint arXiv:2305.15065, 2023.
  34. Focused prefix tuning for controllable text generation. arXiv preprint arXiv:2306.00369, 2023.
  35. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.  142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P11-1015.
  36. Controllable text generation with neurally-decomposed oracle. Advances in Neural Information Processing Systems, 35:28125–28139, 2022.
  37. Mix and match: Learning-free controllable text generationusing energy language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  401–415, 2022.
  38. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  39. Controllable natural language generation with contrastive prefixes. arXiv preprint arXiv:2202.13257, 2022.
  40. Cold decoding: Energy-based constrained text generation with langevin dynamics. Advances in Neural Information Processing Systems, 35:9538–9551, 2022.
  41. Improving language understanding with unsupervised learning. 2018.
  42. Amos Storkey et al. When training and test sets are different: characterizing learning transfer. Dataset shift in machine learning, 30(3-28):6, 2009.
  43. Evaluating large language models on controlled generation tasks. arXiv preprint arXiv:2310.14542, 2023.
  44. MN Team et al. Introducing mpt-7b: a new standard for open-source, commercially usable llms, 2023.
  45. Attention is all you need. In Neural Information Processing Systems, 2017. URL https://api.semanticscholar.org/CorpusID:13756489.
  46. Diverse beam search for improved description of complex scenes. In AAAI Conference on Artificial Intelligence, 2018. URL https://api.semanticscholar.org/CorpusID:19224034.
  47. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
  48. Naturalproofs: Mathematical theorem proving in natural language. arXiv preprint arXiv:2104.01112, 2021.
  49. Grace: gradient-guided controllable retrieval for augmenting attribute-based text generation. In Findings of the Association for Computational Linguistics: ACL 2023, pp.  8377–8398, 2023.
  50. Fudge: Controlled text generation with future discriminators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  3511–3535, 2021.
  51. Collie: Systematic construction of constrained text generation tasks. arXiv preprint arXiv:2307.08689, 2023.
  52. Why johnny can’t prompt: how non-ai experts try (and fail) to design llm prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp.  1–21, 2023.
  53. Discup: Discriminator cooperative unlikelihood prompt-tuning for controllable text generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  3392–3406, 2022.
  54. A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys, 56(3):1–37, 2023a.
  55. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792, 2023b.
  56. Character-level convolutional networks for text classification. In NIPS, 2015.
  57. Energy-based generative adversarial network. ArXiv, abs/1609.03126, 2016. URL https://api.semanticscholar.org/CorpusID:15876696.
  58. Click: Controllable text generation with sequence likelihood contrastive learning. arXiv preprint arXiv:2306.03350, 2023.
  59. Air-decoding: Attribute distribution reconstruction for decoding-time controllable text generation. arXiv preprint arXiv:2310.14892, 2023.
  60. Controlled text generation with natural language instructions. arXiv preprint arXiv:2304.14293, 2023.
Citations (3)

Summary

  • The paper introduces a novel method to generate constraint datasets on-the-fly, reducing reliance on labor-intensive, pre-existing data.
  • Prompting-based methods significantly outperform traditional techniques in stylistic tasks, sometimes achieving near-human performance.
  • The development of ConGenBench, featuring 17 diverse datasets, sets a new standard for evaluating controllable text generation tasks.

Exploring Controllability in Instruction-Tuned LLMs

Introduction to Instruction-Tuned LLMs and Controllability Challenges

Recent advancements in NLP have introduced new methods that make LLMs more adaptable to specific instructions, leading to what we call instruction-tuned LLMs. Despite their capabilities, these models often produce outputs that aren't exactly what we intend—their "controllability" is a bit like trying to steer a cruise ship with a canoe paddle. It's doable, but not always precise.

In this area, we've seen two main avenues for enhancing controllability: traditional controllable generation methods and newer, pioneering prompting-based methods. As these instruction-tuned models become central in applications, understanding and enhancing their controllability is not just academic—it shapes how effectively these models can be employed in real-world applications, adhering to user intents and constraints.

The Surprising Efficacy of Prompting Methods

A significant takeaway from the paper was the superior performance of prompting-based methods over traditional controllable text generation techniques, particularly in managing stylistic tasks. The experiments show that when tasked with generating text that meets specific stylistic guidelines, prompting methods not only competed with—they sometimes exceeded—human performance.

In contrast, these methods seemed to struggle a bit more with structural tasks, highlighting a gap where future work could make a substantial impact.

Advancements and Innovations: ConGenBench and Constraint Dataset Generation

One of the most groundbreaking contributions of this research is the development of ConGenBench, a comprehensive testbed for controllable generation tasks. This includes 17 unique datasets across various text generation tasks, each with its constraints. Moreover, the paper introduces a novel method that leverages an LLM to create constraint datasets from scratch based on task requirements and a natural language description. This approach is a game-changer—it significantly reduces dependency on pre-existing datasets, which are often limited and labor-intensive to produce.

Practical Implications and Theoretical Insights

The findings from this research have significant implications:

  • For Developers and Practitioners: The ability to generate controllable datasets on-the-fly allows for much greater flexibility and customization in deploying LLMs across varied applications, from automated content generation to personalized assistant systems.
  • For Researchers: The results stress the need for more focused studies on controllable text generation specifically tailored for instruction-tuned models. There is clearly room to elevate the structural task performance to the level of stylistic tasks.

Future Directions in AI and Controllability

Looking ahead, the field is ripe for exploration in several areas:

  • Development of More Robust Prompting Techniques: Improving how these models handle structural constraints through advanced prompting strategies could broaden their applicability.
  • Exploration of New Constraints: With the ability to generate constraint datasets, researchers can explore out-of-the-box constraints that were previously too resource-intensive to consider.
  • Fine-Tuning Instruction-Tuned Models: There might be potential in developing specialized instruction-tuned models optimized from the ground up for controllability.

In conclusion, the research paints a promising picture of the future of instruction-tuned LLMs but underscores the necessity for continuous innovation in controllability strategies. With these insights and tools, AI's ability to understand and execute on human intent could take some exciting new strides.