Is GPT-3 a Good Data Annotator? (2212.10450v2)
Abstract: Data annotation is the process of labeling data that could be used to train machine learning models. Having high-quality annotation is crucial, as it allows the model to learn the relationship between the input data and the desired output. GPT-3, a large-scale LLM developed by OpenAI, has demonstrated impressive zero- and few-shot performance on a wide range of NLP tasks. It is therefore natural to wonder whether it can be used to effectively annotate data for NLP tasks. In this paper, we evaluate the performance of GPT-3 as a data annotator by comparing it with traditional data annotation methods and analyzing its output on a range of tasks. Through this analysis, we aim to provide insight into the potential of GPT-3 as a general-purpose data annotator in NLP.
- GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pages 95–136, virtual+Dublin. Association for Computational Linguistics.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Why it is hard to find ai in smes: A survey from the practice and how to promote it. In ICAART.
- An empirical survey of data augmentation for limited data learning in nlp. Transactions of the Association for Computational Linguistics, 11:191–211.
- Palm: Scaling language modeling with pathways. ArXiv, abs/2204.02311.
- When low resource nlp meets unsupervised language model: Meta-pretraining then meta-learning for few-shot text classification (student abstract). In AAAI Conference on Artificial Intelligence.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- GlobalWoZ: Globalizing MultiWoZ to develop multilingual task-oriented dialogue systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1639–1657, Dublin, Ireland. Association for Computational Linguistics.
- Daga: Data augmentation with a generation approach for low-resource tagging tasks. In Conference on Empirical Methods in Natural Language Processing.
- Prompt-learning for fine-grained entity typing. ArXiv, abs/2108.10604.
- Openprompt: An open-source framework for prompt-learning. ArXiv, abs/2111.01998.
- Compositional semantic parsing with large language models. In The Eleventh International Conference on Learning Representations.
- A survey on data augmentation approaches for nlp.
- Making pre-trained language models better few-shot learners. ArXiv, abs/2012.15723.
- Colin Shunryu Garvey. 2018. A framework for evaluating barriers to the democratization of artificial intelligence. In AAAI Conference on Artificial Intelligence.
- Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056.
- Domain adaptation for large-scale sentiment classification: A deep learning approach. In International Conference on Machine Learning.
- FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4803–4809, Brussels, Belgium. Association for Computational Linguistics.
- On the effectiveness of adapter-based tuning for pretrained language model adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2208–2222, Online. Association for Computational Linguistics.
- Training compute-optimal large language models. ArXiv, abs/2203.15556.
- Logicllm: Exploring self-supervised logic-enhanced training for large language models. ArXiv, abs/2305.13718.
- Ask me what you need: Product retrieval using knowledge from gpt-3. ArXiv, abs/2207.02516.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems.
- The power of scale for parameter-efficient prompt tuning. ArXiv, abs/2104.08691.
- Solving quantitative reasoning problems with language models. ArXiv, abs/2206.14858.
- Does gpt-3 demonstrate psychopathy? evaluating large language models from a psychological perspective.
- Chain of knowledge: A framework for grounding large language models with structured knowledge bases. ArXiv, abs/2305.13269.
- A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability. arXiv preprint arXiv:2303.13547.
- Wanli: Worker and ai collaboration for natural language inference dataset creation.
- Mulda: A multilingual data augmentation framework for low-resource cross-lingual ner. In Annual Meeting of the Association for Computational Linguistics.
- Enhancing multilingual language model with massive multilingual knowledge triples.
- Adversarial multi-task learning for text classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1–10, Vancouver, Canada. Association for Computational Linguistics.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys (CSUR).
- P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. ArXiv, abs/2110.07602.
- Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692.
- Crossner: Evaluating cross-domain named entity recognition. In AAAI Conference on Artificial Intelligence.
- Biogpt: Generative pre-trained transformer for biomedical text generation and mining. Briefings in bioinformatics.
- Generating training data with language models: Towards zero-shot language understanding. In Advances in Neural Information Processing Systems.
- Adversarial training methods for semi-supervised text classification. arXiv: Machine Learning.
- OpenAI. 2023. Gpt-4 technical report. arXiv.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
- Deep contextualized word representations. In North American Chapter of the Association for Computational Linguistics.
- Chengwei Qin and Shafiq Joty. 2022a. Continual few-shot relation learning via embedding space regularization and data augmentation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2776–2789, Dublin, Ireland. Association for Computational Linguistics.
- Chengwei Qin and Shafiq Joty. 2022b. LFPT5: A unified framework for lifelong few-shot language learning based on prompt tuning of t5. In International Conference on Learning Representations.
- Learning to initialize: Can meta learning improve cross-task generalization in prompt tuning? arXiv preprint arXiv:2302.08143.
- Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Scaling language models: Methods, analysis & insights from training gopher. ArXiv, abs/2112.11446.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- “democratizing” artificial intelligence in medicine and healthcare: Mapping the uses of an elusive term. Frontiers in Genetics, 13.
- Teven Le Scao and Alexander M. Rush. 2021. How many data points is a prompt worth? In North American Chapter of the Association for Computational Linguistics.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing.
- Galactica: A large language model for science. ArXiv, abs/2211.09085.
- Lamda: Language models for dialog applications. ArXiv, abs/2201.08239.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Want to reduce labeling cost? gpt-3 can help. In Conference on Empirical Methods in Natural Language Processing.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
- Jason Wei and Kai Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In Conference on Empirical Methods in Natural Language Processing.
- Unsupervised data augmentation for consistency training. Advances in neural information processing systems, 33:6256–6268.
- Learning span-level interactions for aspect sentiment triplet extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4755–4766, Online. Association for Computational Linguistics.
- Position-aware tagging for aspect sentiment triplet extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2339–2349, Online. Association for Computational Linguistics.
- Generative data augmentation for commonsense reasoning. Findings of the Association for Computational Linguistics: EMNLP 2020.
- Xlnet: Generalized autoregressive pretraining for language understanding. In Neural Information Processing Systems.
- Opt: Open pre-trained transformer language models. ArXiv, abs/2205.01068.
- Retrieving multimodal information for augmented generation: A survey. arXiv preprint arXiv:2303.10868.