LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification (2005.03695v2)
Abstract: This paper presents our system entitled `LIIR' for SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2). We have participated in sub-task A for English, Danish, Greek, Arabic, and Turkish languages. We adapt and fine-tune the BERT and Multilingual Bert models made available by Google AI for English and non-English languages respectively. For the English language, we use a combination of two fine-tuned BERT models. For other languages we propose a cross-lingual augmentation approach in order to enrich training data and we use Multilingual BERT to obtain sentence representations. LIIR achieved rank 14/38, 18/47, 24/86, 24/54, and 25/40 in Greek, Turkish, English, Arabic, and Danish languages, respectively.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.