2000 character limit reached
AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian (2306.08526v1)
Published 14 Jun 2023 in cs.CL, cs.AI, and cs.LG
Abstract: Lack of available resources such as text corpora for low-resource languages seriously hinders research on natural language processing and computational linguistics. This paper presents AlbMoRe, a corpus of 800 sentiment annotated movie reviews in Albanian. Each text is labeled as positive or negative and can be used for sentiment analysis research. Preliminary results based on traditional machine learning classifiers trained with the AlbMoRe samples are also reported. They can serve as comparison baselines for future research experiments.
- Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 2011. Modern Information Retrieval: The Concepts and Technology behind Search, 2nd edition. Addison-Wesley Publishing Company, USA.
- Alex Boulton. 2017. Data-driven learning and language pedagogy. In S. Thorne & S. May, editor, Language, Education and Technology: Encyclopedia of Language and Education, volume 3 of Encyclopedia of Language and Education: Language and Technology, pages 181–192. Springer.
- Erion Çano and Maurizio Morisio. 2018. Role of data properties on sentiment analysis of texts via convolutions. In Trends and Advances in Information Systems and Technologies, pages 330–337, Cham. Springer International Publishing.
- Erion Çano and Maurizio Morisio. 2017. Moodylyrics: A sentiment annotated lyrics dataset. In Proceedings of the 2017 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence, ISMSI ’17, pages 118–124, New York, NY, USA. ACM.
- Erion Çano and Maurizio Morisio. 2018. A deep learning architecture for sentiment analysis. In Proceedings of the International Conference on Geoinformatics and Data Analysis, ICGDA ’18, page 122–126, New York, NY, USA. Association for Computing Machinery.
- Elisa Corino and Cristina Onesti. 2019. Data-driven learning: A scaffolding methodology for clil and lsp teaching and learning. Frontiers in Education, 4.
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning, 20(3):273–297.
- Tin Kam Ho. 1995. Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1, ICDAR ’95, pages 278–, Washington, DC, USA. IEEE Computer Society.
- Aspect-based sentiment analysis using BERT. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 187–196, Turku, Finland. Linköping University Electronic Press.
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar. Association for Computational Linguistics.
- András Kocsor and László Tóth. 2004. Application of kernel-based feature space transformations and learning methods to phoneme classification. Applied Intelligence, 21(2):129–142.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.
- Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc.
- A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 271–278, Barcelona, Spain.
- Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 79–86. Association for Computational Linguistics.
- GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
- J. R. Quinlan. 1986. Induction of decision trees. Machine Learning, 1:81–106.
- Juan Ramos. 1999. Using tf-idf to determine word relevance in document queries.
- Unified multi-modal pre-training for few-shot sentiment analysis with prompt-based learning. In Proceedings of the 30th ACM International Conference on Multimedia, MM ’22, page 189–198, New York, NY, USA. Association for Computing Machinery.