Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena (2403.06965v1)

Published 11 Mar 2024 in cs.CL

Abstract: Argument Structure Constructions (ASCs) are one of the most well-studied construction groups, providing a unique opportunity to demonstrate the usefulness of Construction Grammar (CxG). For example, the caused-motion construction (CMC, She sneezed the foam off her cappuccino'') demonstrates that constructions must carry meaning, otherwise the fact thatsneeze'' in this context causes movement cannot be explained. We form the hypothesis that this remains challenging even for state-of-the-art LLMs, for which we devise a test based on substituting the verb with a prototypical motion verb. To be able to perform this test at statistically significant scale, in the absence of adequate CxG corpora, we develop a novel pipeline of NLP-assisted collection of linguistically annotated text. We show how dependency parsing and GPT-3.5 can be used to significantly reduce annotation cost and thus enable the annotation of rare phenomena at scale. We then evaluate GPT, Gemini, Llama2 and Mistral models for their understanding of the CMC using the newly collected corpus. We find that all models struggle with understanding the motion component that the CMC adds to a sentence.

References (35)

Authors (3)

Leonie Weissweiler (19 papers)
Abdullatif Köksal (22 papers)
Hinrich Schütze (250 papers)

Citations (2)

View on Semantic Scholar

Summary

The paper presents a novel hybrid human-LLM pipeline that combines dependency parsing, GPT-3.5 classification, and manual verification to efficiently annotate rare linguistic phenomena.
It evaluates multiple LLMs, including GPT, Gemini, Llama2, and Mistral, on their ability to interpret the Caused Motion Construction with significant performance gaps.
The findings underscore an efficient corpus construction method and highlight the need for improved semantic understanding in LLMs to process complex linguistic structures.

Hybrid Human-LLM Corpus Construction and Evaluation for Understanding Rare Linguistic Phenomena

Introduction

Rare linguistic phenomena often elude the grasp of LLMs, presenting a unique challenge to both computational linguistics and the development of more nuanced, understanding AI systems. This paper presents a methodological innovation for both constructing a corpus centered on a rare linguistic structure known as the Caused Motion Construction (CMC) and evaluating various LLMs' capability to comprehend this structure. Through a hybrid pipeline combining human linguistic expertise, NLP tools, and the advanced capabilities of GPT-3.5, this paper not only proposes a cost-efficient approach to annotating rare linguistic phenomena at scale but also critically assesses the current state-of-the-art LLMs against the backdrop of grammatical constructions that require a deeper semantic understanding.

Data Collection Methodology

The paper introduces a novel pipeline for data collection that significantly reduces the annotation burden typically associated with rare linguistic phenomena. This is particularly relevant for the CMC, which involves verbs that are conventionally intransitive, taking on a transitive role and implying motion or displacement as a result of the action.

Key to this methodology is an initial filtering phase using dependency parsing to identify potential CMC instances, followed by a refinement phase through GPT-3.5, which classifies these instances with an instructional prompt specifically tailored to identify the CMC. The dual-stage filtration considerably concentrates the density of CMC instances, and the final dataset comprises both manually verified instances and a larger, semi-automatically annotated corpus. The process involves intricate prompt design and engineering to optimize both the accuracy and the cost-efficiency of the LLM-assisted classification.

Evaluation of LLMs

The primary focus of the evaluation is to ascertain whether various LLMs, including GPT, Gemini, Llama2, and Mistral models, can accurately interpret the CMC, which lies at the intersection of syntax and semantics. A specialized evaluation setup presents sentences containing the CMC to these models and questions whether the direct object in each sentence is indeed moving, an implicit understanding necessary to correctly interpret CMC instances. The accuracy rates reported reveal substantial gaps in the models' understanding, highlighting a critical area for future work on model training and development.

Contributions and Future Work

This paper makes several significant contributions:

A hybrid human-LLM pipeline for the cost-efficient collection of rare linguistic phenomena.
The release of a uniquely compiled corpus, both manually and semi-automatically annotated, centered on the CMC.
An insightful evaluation of several state-of-the-art LLMs on their understanding of the CMC, providing a clear indicator of where these models stand in terms of interpreting complex linguistic constructions.

The methodology and findings underscore the inherent challenges and potential pathways for advancing the understanding capabilities of LLMs. Future research directions include extending this hybrid annotation and evaluation framework to other rare linguistic phenomena and exploring advancements in LLM architectures and training paradigms to enhance their grasp of complex linguistic constructions.

Concluding Remarks

This paper sheds light on the intricacies involved in handling and understanding rare linguistic phenomena by contemporary LLMs. Its methodology and findings contribute valuable insights to the computational linguistics community, providing a clear direction for future research aimed at enhancing the semantic comprehension of LLMs.

Related Papers

Tweets

https://twitter.com/LAWeissweiler/status/1767663078772982216