Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 71 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM (2304.11872v2)

Published 24 Apr 2023 in cs.CL and cs.AI

Abstract: The remarkable performance of LLMs in zero-shot language understanding has garnered significant attention. However, employing LLMs for large-scale inference or domain-specific fine-tuning requires immense computational resources due to their substantial model size. To overcome these limitations, we introduce a novel method, namely GenCo, which leverages the strong generative power of LLMs to assist in training a smaller and more adaptable LLM. In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways. Firstly, the LLM is used to augment each input instance with a variety of possible continuations, enriching its semantic context for better understanding. Secondly, it helps crafting additional high-quality training pairs, by rewriting input texts conditioned on predicted labels. This ensures the generated texts are highly relevant to the predicted labels, alleviating the prediction error during pseudo-labeling, while reducing the dependency on large volumes of unlabeled text. In our experiments, GenCo outperforms previous state-of-the-art methods when only limited ($<5\%$ of original) in-domain text data is available. Notably, our approach surpasses the performance of Alpaca-7B with human prompts, highlighting the potential of leveraging LLM for self-training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
  2. Olivier Chapelle and Alexander Zien. 2005. Semi-supervised classification by low density separation. In International workshop on artificial intelligence and statistics, pages 57–64. PMLR.
  3. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
  5. Celda: Leveraging black-box language model as enhanced classifier without labels. arXiv preprint arXiv:2306.02693.
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. Beyond prompting: Making pre-trained language models better zero-shot learners by clustering representations. arXiv preprint arXiv:2210.16637.
  8. Zerogen+++: Self-guided high-quality data generation in efficient zero-shot learning. arXiv preprint arXiv:2205.12679.
  9. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
  10. Zero-shot text classification with self-training. arXiv preprint arXiv:2210.17541.
  11. Yves Grandvalet and Yoshua Bengio. 2004. Semi-supervised learning by entropy minimization. Advances in neural information processing systems, 17.
  12. Yves Grandvalet and Yoshua Bengio. 2006. Entropy regularization.
  13. Tess: Zero-shot classification via textual similarity comparison with prompting using sentence encoder. arXiv preprint arXiv:2212.10391.
  14. Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689.
  15. Supervised contrastive learning. In Advances in Neural Information Processing Systems, pages 18661–18673. Curran Associates, Inc.
  16. Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3, page 896.
  17. Generating training data with language models: Towards zero-shot language understanding. arXiv preprint arXiv:2202.04538.
  18. Text classification using label names only: A language model self-training approach. arXiv preprint arXiv:2010.07245.
  19. When does label smoothing help? Advances in neural information processing systems, 32.
  20. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  21. Data augmentation for intent classification with off-the-shelf large language models. arXiv preprint arXiv:2204.01959.
  22. Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language models are also few-shot learners. arXiv preprint arXiv:2009.07118.
  23. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp. Transactions of the Association for Computational Linguistics, 9:1408–1424.
  24. Nearest neighbor zero-shot inference. arXiv preprint arXiv:2205.13792.
  25. Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073.
  26. Contrastive distillation on intermediate representations for language model compression. arXiv preprint arXiv:2009.14167.
  27. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  28. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  29. Jesper E Van Engelen and Holger H Hoos. 2020. A survey on semi-supervised learning. Machine learning, 109(2):373–440.
  30. Pesco: Prompt-enhanced self contrastive learning for zero-shot text classification.
  31. Few-shot text classification with triplet networks, data augmentation, and curriculum learning 2021. arXiv preprint arXiv:2103.07552.
  32. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487. PMLR.
  33. Zerogen: Efficient zero-shot learning via dataset generation. arXiv preprint arXiv:2202.07922.
  34. Gpt3mix: Leveraging large-scale language models for text augmentation. arXiv preprint arXiv:2104.08826.
  35. Weakly-supervised text classification based on keyword graph.
  36. Long-tailed extreme multi-label text classification by the retrieval of generated pseudo label descriptions. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1092–1106, Dubrovnik, Croatia. Association for Computational Linguistics.
Citations (8)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.