BERT Embeddings for Automatic Readability Assessment (2106.07935v2)

Published 15 Jun 2021 in cs.CL

Abstract: Automatic readability assessment (ARA) is the task of evaluating the level of ease or difficulty of text documents for a target audience. For researchers, one of the many open problems in the field is to make such models trained for the task show efficacy even for low-resource languages. In this study, we propose an alternative way of utilizing the information-rich embeddings of BERT models with handcrafted linguistic features through a combined method for readability assessment. Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets, obtaining as high as 12.4% increase in F1 performance. We also show that the general information encoded in BERT embeddings can be used as a substitute feature set for low-resource languages like Filipino with limited semantic and syntactic NLP tools to explicitly extract feature values for the task.

Citations (32)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

BERT Embeddings for Automatic Readability Assessment (2106.07935v2)

Summary

Related Papers