Surveying the Dead Minds: Historical-Psychological Text Analysis with Contextualized Construct Representation (CCR) for Classical Chinese (2403.00509v1)

Published 1 Mar 2024 in cs.CL, cs.AI, and cs.CY

Abstract: In this work, we develop a pipeline for historical-psychological text analysis in classical Chinese. Humans have produced texts in various languages for thousands of years; however, most of the computational literature is focused on contemporary languages and corpora. The emerging field of historical psychology relies on computational techniques to extract aspects of psychology from historical corpora using new methods developed in NLP. The present pipeline, called Contextualized Construct Representations (CCR), combines expert knowledge in psychometrics (i.e., psychological surveys) with text representations generated via transformer-based LLMs to measure psychological constructs such as traditionalism, norm strength, and collectivism in classical Chinese corpora. Considering the scarcity of available data, we propose an indirect supervised contrastive learning approach and build the first Chinese historical psychology corpus (C-HI-PSY) to fine-tune pre-trained models. We evaluate the pipeline to demonstrate its superior performance compared with other approaches. The CCR method outperforms word-embedding-based approaches across all of our tasks and exceeds prompting with GPT-4 in most tasks. Finally, we benchmark the pipeline against objective, external data to further verify its validity.

References (70)

Citations (1)

View on Semantic Scholar

Summary

The paper presents the novel CCR pipeline that integrates psychometrics with Transformer-based models to analyze psychological constructs in classical Chinese texts.
It employs indirect supervised contrastive learning and the creation of the C-HI-PSY corpus to address linguistic challenges.
Evaluation shows CCR outperforms traditional word-embedding methods and GPT-4 prompting in extracting historical psychological insights.

Advancing Historical Psychological Text Analysis with Contextualized Construct Representation in Classical Chinese

Introduction to the Study

The paper introduces a novel computational pipeline called Contextualized Construct Representation (CCR) specifically tailored for historical-psychological text analysis in classical Chinese. The motivation stems from the need to explore rich historical corpora that encapsulate the psychological constructs of ancient populations, a task that remains underexplored due to the historical and linguistic complexities of such texts. The CCR pipeline innovatively combines psychometrics with advanced LLM representations, aiming to examine psychological constructs such as traditionalism and collectivism in classical Chinese texts. This approach addresses the significant gap in the literature by employing Transformer-based models and a new Chinese historical psychology corpus for fine-tuning these models, opening new avenues in the quantitative paper of history through the lens of psychological constructs.

Methodological Insights

The pipeline devised for the CCR method involves several key steps aimed at capturing the psychological constructs from historical texts. A notable feature is its use of expert knowledge in psychometrics and indirect supervised contrastive learning for model fine-tuning. The creation of the C-HI-PSY corpus, the first of its kind, along with the cross-lingual questionnaire conversion pipeline, stands out as a methodological innovation designed to address the linguistic challenges of engaging with classical Chinese texts. Additionally, the paper's approach to fine-tuning pre-trained Transformer models using this corpus underscores the paper's methodological rigor and its potential in enhancing the quality of psychological construct representation in historical texts.

Evaluation and Results

The CCR method demonstrated superior performance over existing word-embedding-based approaches and, interestingly, outperformed the prompting method with GPT-4 in most tasks. Such findings not only validate the efficacy of the CCR pipeline but also attest to the nuanced capability of contextualized models in understanding and representing psychological constructs within historical texts. Furthermore, the validation of CCR against historically verified attitudes towards reforms provided tangible evidence supporting the model's practical utility and its potential in contributing to our understanding of historical psychology through text analysis.

Benchmarking and Implications for Historical Psychology

The benchmarking of CCR using the dataset on officials' attitudes toward reform in the 11th century provides a concrete example of how the pipeline can be applied to real-world historical texts to extract psychological insights. The significant correlations found between the constructs of traditionalism and authority and the officials' attitudes underscore the method's relevance and effectiveness. Such an application not only illustrates the potential of CCR in historical-psychological research but also offers a new lens through which historical events and figures can be analyzed psychologically.

Conclusion and Future Directions

The paper makes a compelling case for the utilization of advanced NLP techniques in the exploration of psychological constructs within historical corpora. The successful development and validation of the CCR pipeline mark a significant advancement in the interdisciplinary field of historical psychology and computational linguistics. Looking ahead, the paper paves the way for further research into other languages and periods, expanding our understanding of historical psychology across different cultures and timeframes. Moreover, addressing the limitations noted in the paper, specifically the noise introduced by the indirect supervised learning approach, could further refine the CCR pipeline and enhance its applicability to a broader range of historical texts.

In conclusion, this research opens new frontiers in the computational analysis of historical texts, offering valuable tools and methodologies for historians, psychologists, and computational linguists interested in exploring the psychological dimensions of historical narratives. The future development of this field could significantly enrich our understanding of the human past, bridging the gap between historical events and psychological analysis.