Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions (2306.04140v1)

Published 7 Jun 2023 in cs.CL

Abstract: LLMs can be used to generate text data for training and evaluating other models. However, creating high-quality datasets with LLMs can be challenging. In this work, we explore human-AI partnerships to facilitate high diversity and accuracy in LLM-based text data generation. We first examine two approaches to diversify text generation: 1) logit suppression, which minimizes the generation of languages that have already been frequently generated, and 2) temperature sampling, which flattens the token sampling probability. We found that diversification approaches can increase data diversity but often at the cost of data accuracy (i.e., text and labels being appropriate for the target domain). To address this issue, we examined two human interventions, 1) label replacement (LR), correcting misaligned labels, and 2) out-of-scope filtering (OOSF), removing instances that are out of the user's domain of interest or to which no considered label applies. With oracle studies, we found that LR increases the absolute accuracy of models trained with diversified datasets by 14.4%. Moreover, we found that some models trained with data generated with LR interventions outperformed LLM-based few-shot classification. In contrast, OOSF was not effective in increasing model accuracy, implying the need for future work in human-in-the-loop text data generation.

Citations (80)

View on Semantic Scholar

Summary

The paper presents a synergistic methodology that pairs LLM-generated text with human interventions to enhance diversity without sacrificing accuracy.
It demonstrates that techniques such as logit suppression, temperature sampling, and label replacement can boost model performance, including a 14.4% accuracy improvement.
The study highlights the critical role of human oversight in refining automated processes to produce robust training datasets for AI models.

Introduction

The innovative use of LLMs for generating text data has been a significant development in AI, particularly in the context of training and evaluating models. Though LLMs like GPT-3 have altered the landscape of data generation, creating high-quality datasets that both diversify content and maintain accuracy presents challenges. The paper undertaken by Chung, Kamar, and Amershi provides insights into this conundrum by examining the effectiveness of human-AI partnerships in generating text data that is diverse and accurate.

Diversified Text Data Generation

Initial methods to diversify text generation involved logit suppression and temperature sampling. Logit suppression aims to diminish the likelihood of tokens that frequently occur, creating variation in the text produced. Temperature sampling, on the other hand, adjusts the probability distribution, making less likely text choices more probable. While these methods showed promise in increasing the diversity of the generated text, they often compromised the data's accuracy, leading to a misalignment between generated text and target labels.

Human Interventions

To address the weak spots of solely relying on LLMs, the paper explores two forms of human intervention – label replacement (LR) and out-of-scope filtering (OOSF). LR involves humans correcting inaccuracies in dataset labels, substantially improving model accuracy by an average of 14.4% when combined with diversified datasets. Interestingly, models trained on datasets improved with LR could sometimes surpass the performance of LLMs in few-shot classification tasks, revealing a notable advantage. Meanwhile, OOSF, which eliminates irrelevant or unclassifiable instances, showed variable effectiveness across tasks, indicating that its ultimate utility may depend on the specificities of the dataset and approach used.

Synergy of Human-AI Approaches

The combination of LLM-generated datasets with human oversight endorsed the idea that while AI can vastly speed up data generation, human expertise remains indispensable for ensuring quality. The primary contribution of this research lies in a proposed methodology that synergizes AI capabilities with targeted human efforts for data generation, with an added focus on improving diversification while honoring label alignment. This methodology, which includes interventions like LR and OOSF, showcases how machine efficiency and human nuance can be harmonized to create robust training datasets for AI models.

Closing Thoughts

In summary, the research is a testament to the evolving landscape of AI, where human-AI collaboration is not just beneficial but necessary for advancing AI utility. It also opens up avenues for future inquiry into how such collaborations can be refined, how biases in LLM-generated data can be addressed, and how the balance between data diversity and accuracy can be optimized for model training. The findings of this paper will be of interest to model builders who leverage LLMs for data generation, emphasizing the judicious blend of automation and expert intervention in the process of creating quality datasets.

PDF Markdown