Automata-based constraints for language model decoding (2407.08103v3)

Published 11 Jul 2024 in cs.CL and cs.FL

Abstract: LLMs (LMs) are often expected to generate strings in some formal language; for example, structured data, API calls, or code snippets. Although LMs can be tuned to improve their adherence to formal syntax, this does not guarantee conformance, especially with smaller LMs suitable for large-scale deployment. In addition, tuning requires significant resources, making it impractical for uncommon or task-specific formats. To prevent downstream parsing errors we would ideally constrain the LM to only produce valid output, but this is severely complicated by tokenization, which is typically both ambiguous and misaligned with the formal grammar. We solve these issues through the application of automata theory, deriving an efficient closed-form solution for the regular languages, a broad class of formal languages with many practical applications, including API calls or schema-guided JSON and YAML. We also discuss pragmatic extensions for coping with the issue of high branching factor, and extend our techniques to deterministic context-free languages, which similarly admit an efficient closed-form solution. Previous work on this topic (Willard and Louf, 2023) layers bespoke solutions onto automata, leading to problems with speed, correctness, and extensibility. Instead, we reformulate the entire task in terms of automata so we can leverage well-studied and well-optimized algorithms. Our system compiles constraints ~7,000x faster, is provably correct, and can be extended in a modular fashion.

Citations (8)

View on Semantic Scholar

Summary

The paper proposes a novel automata-theoretic method that enforces valid token sequences in language model outputs.
It leverages finite-state automata, transducers, and push-down automata to ensure structured data like JSON and Python dataclasses conform to specifications.
The approach improves decoding efficiency and reliability, enabling real-time applications such as API call generation and speculative decoding.

Automata-based Constraints for LLM Decoding

The paper "Automata-based constraints for LLM decoding," authored by Terry Koo, Frederick Liu, and Luheng He from Google DeepMind, presents a novel approach to ensuring that the outputs of LLMs (LMs) conform to specified formal languages using principles of automata theory. This method is particularly advantageous for generating structured data, API calls, and code snippets, which require strict adherence to syntactical constraints.

Overview and Contributions

LLMs, particularly smaller ones adapted for large-scale deployment, may occasionally generate outputs that deviate from expected formal language structures. Traditional fine-tuning methods to enforce conformance are resource-intensive and impractical for uncommon or highly specific formats. This paper proposes leveraging finite-state automata (FSA) and finite-state transducers (FST) to apply hard constraints on LMs, ensuring the generation of valid outputs efficiently.

The authors have identified key contributions in their research:

Detokenization as Transduction: Reformulating detokenization as an FST, which maps token sequences back to text.
Adaptation of Regular Expressions to Tokens: Converting FSAs that accept character sequences into FSAs that accept token sequences, enabling token-level constraints on formal languages.
Extensions for Practical Applications: Introducing special capturing groups and FST extensions to handle high branching factors and enhance usability.

Methodology

Finite-state Constraints:

The method involves forming constraints for an LM by masking decoding logits. This involves:

Building a mask of valid next tokens based on the current state of constraints.
Penalizing the logits for invalid tokens to ensure only valid tokens are chosen.
Updating the state of constraints accordingly after each token is selected.

Addressing Tokenization:

A significant challenge in this approach is the tokenization used by popular LMs, which may not align with the formal grammar. This paper finds solutions within automata theory by constituting a vocabulary of token detokenization using FSTs.

By constructing an FST (T_V) for any given token vocabulary, the paper ensures that any sequence accepted by the adapted FSA (A’) corresponds to a valid detokenized sequence accepted by the original FSA (A). This decomposition separates the vocabulary-specific component from the grammar-specific component, fostering efficiency and modularity.

Extensions:

Special capturing groups (terminal labels) handle cases where matching tokens would otherwise lead to a large number of outbound edges, making operations computationally expensive. By using these extensions, constraints on tokens are applied more efficiently.

Push-down Constraints

The paper extends its automata-theoretic approach to deterministic context-free languages using push-down automata (PDAs). PDAs are essentially FSAs equipped with a stack, capable of handling more complex syntactical structures. Although deterministic PDAs are constrained to deterministic context-free languages, the vast majority of structured data formats fall within this category.

Employing PDAs allows the same methodology used for FSAs to be applied to more complex grammars by creating a correspondence between character-based PDAs and token-based PDAs through FST composition.

Practical Implications and Applications

The approach's primary advantage is its efficiency and simplicity, making it applicable for real-time, large-scale LLM deployments. By adopting an automata-theoretic perspective, the method is broadly applicable across different constraints and LLMs.

JSON Generation:

One application described is generating JSON outputs that conform to specified schemas. The set of JSON expressions matching a schema is shown to be a regular language, simplifying the constraint application. Tools are developed to automatically translate JSON schemas into regular expressions, further easing implementation.

Python Dataclasses:

The method can ensure that generated data conforms to Python dataclass definitions. This has practical implications for generating structured programmatic content that must adhere to specific class schemas.

Speculative Decoding:

The technique enhances speculative decoding, wherein a smaller, faster model generates token sequences that a larger model then validates. By constraining the smaller model, the acceptance rate of sampled tokens is increased, significantly improving the overall efficiency of the decoding process.

Conclusion and Future Work

The reformulation of detokenization as an FST marks this paper's primary contribution, providing a framework that unifies constraint application under automata theory. The clean and efficient solutions offered for tokenization ambiguities establish this method as a practical tool for structured language generation.

The paper hints at further exploration into PDAs, considering their more compact nature and expressive power for more demanding tasks. Future research may delve into optimizing grammar specifications to avoid non-deterministic PDAs and expanding the utility of context-free constraints for longer, more complex decoding tasks.

This work lays a robust foundation for integrating automata theory with LLM decoding, promising more reliable and syntactically precise AI-generated content.

PDF Markdown

Related Papers

Tweets

https://twitter.com/fly51fly/status/1811753264037396960

https://twitter.com/gioen__/status/1812543462316626138

https://twitter.com/AdvaitOnline/status/1831080159753204071

https://twitter.com/bhi5hmaraj/status/1821169276226601119

https://twitter.com/mctalentowen/status/1812443697063022886

YouTube

Show All Videos