Papers
Topics
Authors
Recent
2000 character limit reached

Prompting-based Synthetic Data Generation for Few-Shot Question Answering (2405.09335v1)

Published 15 May 2024 in cs.CL

Abstract: Although LMs have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the domain they were trained on. Since annotation is costly, we argue that domain-agnostic knowledge from LMs, such as linguistic understanding, is sufficient to create a well-curated dataset. With this motivation, we show that using LLMs can improve Question Answering performance on various datasets in the few-shot setting compared to state-of-the-art approaches. For this, we perform data generation leveraging the Prompting framework, suggesting that LLMs contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme. As a result, we consistently outperform previous approaches on few-shot Question Answering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Synthetic QA Corpora Generation with Roundtrip Consistency.
  2. Do Not Have Enough Data? Deep Learning to the Rescue! Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):7383–7390.
  3. Jatin Arora and Youngja Park. 2023. Split-NER: Named Entity Recognition via Two Question-Answering-based Classifications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 416–426, Toronto, Canada. Association for Computational Linguistics.
  4. Language Models are Few-Shot Learners.
  5. How Optimal is Greedy Decoding for Extractive Question Answering?
  6. Rakesh Chada and Pradeep Natarajan. 2021. FewshotQA: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models.
  7. Gotta: Generative Few-shot Question Answering by Prompt-based Cloze Data Augmentation.
  8. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
  9. MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension.
  10. Dialog State Tracking: A Neural Reading Comprehension Approach.
  11. Making Pre-trained Language Models Better Few-shot Learners.
  12. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios.
  13. The Curious Case of Neural Text Degeneration.
  14. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension.
  15. Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5376–5384.
  16. Natural Questions: a Benchmark for Question Answering Research. Transactions of the Association of Computational Linguistics.
  17. Zero-Shot Relation Extraction via Reading Comprehension.
  18. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.
  19. Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation.
  20. A Unified MRC Framework for Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5849–5859, Online. Association for Computational Linguistics.
  21. Entity-Relation Extraction as Multi-Turn Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1340–1350, Florence, Italy. Association for Computational Linguistics.
  22. Named Entity Recognition without Labelled Data: A Weak Supervision Approach.
  23. Low-Resource NER by Data Augmentation With Prompting. volume 5, pages 4252–4258.
  24. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.
  25. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
  26. Template-free Prompt Tuning for Few-shot NER. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5721–5732, Seattle, United States. Association for Computational Linguistics.
  27. Unsupervised Domain Adaptation of Language Models for Reading Comprehension.
  28. Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 436–442, Marseille, France. European Language Resources Association.
  29. Boosting Low-Resource Biomedical QA via Entity-Aware Masking Strategies.
  30. Training Question Answering Models From Synthetic Data.
  31. Language Models are Unsupervised Multitask Learners. undefined.
  32. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
  33. SQuAD: 100,000+ Questions for Machine Comprehension of Text.
  34. Few-Shot Question Answering by Pretraining Span Selection.
  35. Timo Schick and Hinrich Schütze. 2021. Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference.
  36. Improving Low-Resource Question Answering using Active Learning in Multiple Stages.
  37. B. Settles. 2012. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning Series. Morgan & Claypool.
  38. Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension.
  39. End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems.
  40. Noam Shazeer and Mitchell Stern. 2018. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost.
  41. NewsQA: A Machine Comprehension Dataset.
  42. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16(1):138.
  43. Attention Is All You Need.
  44. KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering.
  45. PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks.
  46. From Clozing to Comprehending: Retrofitting Pre-trained Language Model to Pre-trained Machine Reader.
  47. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering.
  48. Multi-Stage Pre-training for Low-Resource Domain Adaptation.
  49. EntQA: Entity Linking as Question Answering.
  50. Factual Probing Is [MASK]: Learning vs. Learning to Recall.
Citations (4)

Summary

  • The paper presents a two-step pipeline using answer sampling with NER and prompt-based question generation to create synthetic training data for few-shot QA.
  • It leverages T5 v1.1 and filtering mechanisms to ensure the quality and relevance of generated questions, outperforming traditional data annotation methods.
  • Experiments on benchmarks like SQuAD and TextbookQA show that synthetic data with 128 samples can match human-annotated quality, reducing resource needs.

Prompting-Based Synthetic Data Generation for Few-Shot Question Answering

Introduction

The paper "Prompting-based Synthetic Data Generation for Few-Shot Question Answering" (2405.09335) addresses the challenge of enhancing Question Answering (QA) performance in scenarios with limited labeled data. It leverages LLMs to generate synthetic domain-specific data, thus reducing the need for extensive data annotation. The paper focuses on extractive QA, wherein the answer is located as a span within a given context.

The authors propose a method that utilizes prompt-based data generation, arguing that pre-trained LLMs contain valuable task-agnostic and domain-agnostic knowledge that can be harnessed to improve few-shot QA models. This approach is particularly beneficial in low-resource settings, where the annotation process is resource-intensive. Figure 1

Figure 1: Comparison of a) common approaches, e.g., Prompting, for MRQA and b) our approach adding synthetic task- and domain-specific data without the need of additional labeled data.

Methodology

The proposed methodology comprises two primary steps: Answer Sampling and Question Generation.

Answer Sampling

The paper utilizes Named Entity Recognition (NER) to sample potential answer spans from the context. This technique is deemed efficient as it does not require extensive domain-specific knowledge or labeled data, making it applicable across diverse domains.

Question Generation

The second step involves formulating prompts to direct the LLM to generate questions based on the sampled answers and the context. This employs the encoder-decoder architecture model, T5 v1.1, which facilitates the conditioning of output on the entire input sequence rather than merely preceding tokens. A template guides the prompt to include both the context and sampled answer, predicting the question in response.

The generation process incorporates soft tokens, initialized from pre-trained word embeddings, and a filtering mechanism to ensure the quality and relevance of the generated questions. Rule-based filtering discards nonsensical outputs, while consistency filtering ensures the generated questions align with predicted answers using a pre-trained MRQA model. Figure 2

Figure 2: An example of our data generation pipeline: We first sample answer candidates (using NER) and then prompt a PLM to generate a question conditioned on context and answer (1). The generated question-answer pair is then used with the initial context to train an MRQA model (2). We afterwards perform additional training on labeled data if available.

Experimental Setup and Results

The experimental setup includes evaluations on the Few-Shot MRQA benchmark, examining multiple datasets including SQuAD. The methodology demonstrates significant performance benefits, outperforming existing state-of-the-art approaches. Notably, the paper confirms that synthetic data generated from LM prompting achieves high F1 scores, even surpassing full data settings in certain cases, like TextbookQA. Figure 3

Figure 3: MRQA performance (F1) as a function of dataset sizes for the best performing approaches on the mean of all datasets in the few-shot MRQA benchmark.

Analysis

A user paper evaluated the quality of generated question-answer pairs, reflecting that data generated with 128 samples was comparable to human-annotated data in quality. This highlights the potential for reducing the annotation effort required without sacrificing data quality. Figure 4

Figure 4: For the NewsQA dataset, 100 question-answer pairs were quality-assessed by humans in each setting (generated data taking 16 and 128 samples into account as well as labeled (gold) data).

Conclusion

The paper presents a compelling case for utilizing LLMs in data generation for QA tasks, demonstrating robust improvements in few-shot scenarios. The approach successfully bridges the performance gap between extensive labeled datasets and few-shot models, suggesting a paradigm shift towards using pre-trained LLMs for generating high-quality synthetic data.

Future Directions

Future research can explore leveraging LLMs for both question and answer generation, addressing the complexity of extractive QA. Additionally, the integration of feedback mechanisms and in-context learning approaches could further enhance model performance in low-resource settings.

Ethical Considerations

The research was conducted with ethical considerations, especially pertaining to the user paper, ensuring participant consent, privacy, and fair compensation through a standardized platform.

This approach presents a viable path forward in reducing dependence on costly manual annotation across diverse application domains in question answering systems.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: