Papers
Topics
Authors
Recent
Search
2000 character limit reached

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

Published 7 Mar 2023 in cs.CL and cs.LG | (2303.03953v2)

Abstract: ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end. In this paper, we examine whether ChatGPT can be used for zero-shot text classification, more specifically, automatic genre identification. We compare ChatGPT with a multilingual XLM-RoBERTa LLM that was fine-tuned on datasets, manually annotated with genres. The models are compared on test sets in two languages: English and Slovenian. Results show that ChatGPT outperforms the fine-tuned model when applied to the dataset which was not seen before by either of the models. Even when applied on Slovenian language as an under-resourced language, ChatGPT's performance is no worse than when applied to English. However, if the model is fully prompted in Slovenian, the performance drops significantly, showing the current limitations of ChatGPT usage on smaller languages. The presented results lead us to questioning whether this is the beginning of an end of laborious manual annotation campaigns even for smaller languages, such as Slovenian.

Citations (80)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.