Synthetic Query Generation using Large Language Models for Virtual Assistants (2406.06729v1)

Published 10 Jun 2024 in cs.IR, cs.AI, and cs.CL

Abstract: Virtual Assistants (VAs) are important Information Retrieval platforms that help users accomplish various tasks through spoken commands. The speech recognition system (speech-to-text) uses query priors, trained solely on text, to distinguish between phonetically confusing alternatives. Hence, the generation of synthetic queries that are similar to existing VA usage can greatly improve upon the VA's abilities -- especially for use-cases that do not (yet) occur in paired audio/text data. In this paper, we provide a preliminary exploration of the use of LLMs to generate synthetic queries that are complementary to template-based methods. We investigate whether the methods (a) generate queries that are similar to randomly sampled, representative, and anonymized user queries from a popular VA, and (b) whether the generated queries are specific. We find that LLMs generate more verbose queries, compared to template-based methods, and reference aspects specific to the entity. The generated queries are similar to VA user queries, and are specific enough to retrieve the relevant entity. We conclude that queries generated by LLMs and templates are complementary.

Citations (2)

View on Semantic Scholar

Summary

The paper presents using LLMs to generate synthetic queries that closely mirror real user interactions for voice assistants.
It compares LLM-based methods with traditional template-based approaches using metrics like negative log-likelihood and reciprocal rank.
It highlights the complementary benefits of both methods in addressing common and emerging query scenarios to improve VA responsiveness.

Synthetic Query Generation using LLMs for Virtual Assistants

This paper explores the integration of LLMs into synthetic query generation for Virtual Assistants (VAs), emphasizing the generation of queries that mirror user interactions and improve ASR systems' robustness, especially for emerging or infrequent use-cases.

Introduction

Virtual Assistants rely heavily on voice commands, where speech-to-text systems convert spoken queries into processable text. This system faces challenges, especially with phonetically similar terms, necessitating a robust query prior. Synthetic queries, traditionally generated using template-based methods, serve to reinforce these priors but often lack specificity and flexibility. The advent of LLMs offers a promising alternative to create more contextually rich and specific queries that might better capture the nuances of user intent.

Methodology

Approach

The paper proposes a method using LLMs for generating queries from textual descriptions sourced from Wikipedia. Entity-specific prompts are crafted for the LLM to simulate potential user queries. The approach is depicted in a structured pipeline (Figure 1).

Figure 1: Proposed pipeline to generate queries for a VA via an LLM.

Knowledge Base and Prompt Engineering

A comprehensive knowledge base is constructed by linking metadata from popular music platforms to Wikipedia entries via SPARQL queries. This ensures detailed context for each artist, which is processed into prompts that guide the LLM in query generation. The prompts are finely tuned to reflect commands typical of VA interactions, such as playing specific songs or learning more about an artist.

Experimental Setup

Methods

The paper contrasts traditional template-based query generation with four variants of OpenAI's LLMs. Key evaluation measures include:

Negative Log-Likelihood (NLL): This metric assesses how well the generated queries align with real-world VA query logs.
Reciprocal Rank (RR): This measures how effectively the generated queries can retrieve the correct artist from a constructed index.

Query Generation

Queries are generated for over 14,161 music artists using both the entity name and synthetic methods, yielding diverse query structures. The LLM-based methods, particularly the GPT-3.5 and GPT-4 series, exhibit the capacity to generate semantically rich queries.

Results and Discussion

Query Length and Diversity

LLM-generated queries tend to be longer and more complex than those generated by templates (Figure 2), often incorporating specific details from the artist's biography. This verbosity, while potentially advantageous for detailed requests, can introduce retrieval challenges due to increased specificity.

Figure 2: Distribution of generated query lengths across the approaches under consideration.

Domain Match and Specificity

The paper finds that LLM-generated queries correlate well with actual user behavior, albeit exhibiting higher specificity. This specificity, while sometimes reducing retrieval performance, highlights the LLM's ability to generate queries that better reflect nuanced user intents versus the more generic template-based queries.

Complementarity of Methods

The paper concludes that template-based and LLM-based queries together offer a comprehensive solution, with templates ensuring common usage scenarios coverage and LLMs addressing rare or emerging entity-specific scenarios efficiently.

Conclusions

The integration of LLMs into query generation for VAs demonstrates complementary strengths to traditional template-based methods. Future work should focus on optimizing this synthesis to dynamically adjust query generation in response to evolving user behavior and entity prominence. Additionally, refining prompt strategies and further fine-tuning LLMs may enhance the specificity and relevance of generated queries, potentially improving the overall VA-user interaction experience.