Emergent Mind

Abstract

Lexical Substitution discovers appropriate substitutes for a given target word in a context sentence. However, the task fails to consider substitutes that are of equal or higher proficiency than the target, an aspect that could be beneficial for language learners looking to improve their writing. To bridge this gap, we propose a new task, language proficiency-oriented lexical substitution. We also introduce ProLex, a novel benchmark designed to assess systems' ability to generate not only appropriate substitutes but also substitutes that demonstrate better language proficiency. Besides the benchmark, we propose models that can automatically perform the new task. We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score and achieves comparable results with GPT-4 on ProLex.

Overview

  • The paper introduces ProLex, a new benchmark for improving vocabulary diversity among English second-language learners by focusing on proficiency-oriented lexical substitution.

  • ProLex uses a dataset based on the TOEFL-11 essay corpus to ensure relevance to the actual usage patterns of L2 learners and includes human-annotated data.

  • The Llama2-13B model, fine-tuned for this task, outperformed other language models, highlighting the potential of LLMs in proficiency-oriented tasks.

  • GPT-4 demonstrated strong capabilities in addressing the complexities of lexical substitution without additional context or with limited contextual information.

  • ProLex aims to enhance computational English language learning tools by expanding learners' vocabulary and aiding in writing skills, with plans for further development and refinement.

Introduction

In the sphere of automatic English learning tools, while grammar correction systems have received considerable attention, enhancing vocabulary diversity through apt lexical choices remains integral. Researchers have identified a challenge for English second-language (L2) learners: they tend to rely on a limited vocabulary set, impeding their performance in expressive writing. Existing lexical substitution systems aid learners in identifying appropriate word alternatives within a given context, promoting vocabulary expansion, but prior work largely disregards proficiency level in substituting target words.

ProLex Benchmark

To address this gap, the paper presents ProLex, a benchmark for evaluating language proficiency-oriented lexical substitution, advancing beyond the current paradigm that prioritizes contextual suitability. ProLex is grounded in the frequency of target words from the TOEFL-11 essay corpus, which represents typical L2 English learner usage patterns. This focus ensures that the benchmark aligns with the lexicon of beginner learners. A salient feature of ProLex is its human-annotated dataset, where human experts gauge candidate substitutes generated by GPT-4, following a comprehensive annotation scheme covering aspects like semantic integrity, collocation accuracy, lexical variation, and grammatical correctness.

Methodology and Model Performance

To facilitate automated assessment of this task, models were developed and benchmarked against ProLex. One model of note is the Llama2-13B model, fine-tuned with synthetic data tailored to the task, which outshined contemporary large-scale LLMs in performance metrics. GPT-4's proficiency in zero-shot and in-context learning settings further illustrates the feasibility of LLMs in addressing semantically complex tasks such as lexical substitution with a proficiency orientation.

Conclusions and Prospects

In summary, the introduction of ProLex paves the way for substantial advancements in computational English language learning, particularly in honing vocabulary breadth and writing dexterity among L2 learners. The benchmark empowers systems to recommend lexically diverse and proficient word substitutions, facilitating educational progress. Moving forward, the corpus intends to expand, refining its representativeness and fostering system advancements in the realm of L2 instructional technology.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.