Instruction-tuning Aligns LLMs to the Human Brain (2312.00575v2)

Published 1 Dec 2023 in cs.CL

Abstract: Instruction-tuning is a widely adopted finetuning method that enables LLMs to generate output that more closely resembles human responses. However, no studies have shown that instruction-tuning actually teaches LLMs to process language in a similar manner as humans. We investigate the effect of instruction-tuning on aligning LLM and human language processing mechanisms in two ways: (1) brain alignment, the similarity of LLM internal representations to neural activity in the human language system, and (2) behavioral alignment, the similarity of LLM and human behavior on a reading task. We assess 25 vanilla and instruction-tuned LLMs on three datasets involving humans reading naturalistic stories and sentences, and find that instruction-tuning generally enhances brain alignment (~6%), but has no similar effect on behavioral alignment. To identify factors underlying this improvement in brain alignment, we compute correlations between brain alignment and various LLM properties, such as model size, problem-solving, and world knowledge understanding. Notably, we find a strong positive correlation between brain alignment and model size (r = 0.95), as well as performance on tasks requiring world knowledge (r = 0.81). Our results demonstrate that instruction-tuning LLMs improves both world knowledge representations and brain alignment, suggesting that the mechanisms that encode world knowledge in LLMs also improve representational alignment to the human brain.

References (49)

Citations (10)

View on Semantic Scholar

Summary

The paper demonstrates that instruction-tuning improves brain alignment by 6.2%, strengthening the link between LLM representations and human neural activity.
The study reveals a strong correlation between model size, world knowledge, and enhanced brain similarity scores (r=0.95 and r=0.81).
The paper contrasts improved neural alignment with negligible gains in behavioral alignment, highlighting distinct evaluation metrics.

Instruction-tuning Aligns LLMs to the Human Brain

The paper by Khai Loong Aw et al. presents an investigation into the effects of instruction-tuning on the representational similarity between LLMs and the human brain language system. The primary focus is to determine whether instruction-tuning, a prevalent fine-tuning approach for LLMs, enhances their alignment with human neural activity and behavior.

The authors evaluate LLM-human similarity on two fronts: brain alignment and behavioral alignment. Brain alignment is measured as the correspondence between LLM internal representations and neural activity within the human language system, whereas behavioral alignment assesses the parallelism between LLM outputs and human behavior on a reading task.

Several key findings are documented in the paper:

Improvement in Brain Alignment: Instruction-tuning enhances brain alignment by an average of 6.2% across evaluated datasets, as measured by the Brain-Score metric. This suggests that instruction-tuning mechanisms that encapsulate world knowledge also boost the representational alignment to human brain activity. Noteworthily, the paper finds a high correlation between brain alignment and both model size (r = 0.95) and task performance necessitating world knowledge (r = 0.81).
Model Properties and Alignment: The research highlights that world knowledge and model size are strongly correlated with brain alignment. The authors analyze correlations using performance scores from two benchmarks: the Massive Multi-task Language Understanding (MMLU) for world knowledge, and the Big-Bench Hard (BBH) for problem-solving abilities. The correlation results indicate that possessing expansive world knowledge is significant for aligning LLMs with human brain activity.
Contrasting Behavioral Alignment: Interestingly, instruction-tuning shows negligible enhancement in behavioral alignment, as assessed by comparing LLM perplexity with human reading times. This indicates that while instruction-tuning aligns the internal model representations more closely to brain activity, it does not translate to a comparable alignment in behavioral measures.

The implications of this research are manifold for both NLP and neuroscience. From an NLP perspective, the improvement in brain alignment through instruction-tuning suggests an approach for developing LLMs that potentially leverage neural alignment evaluations to build models with enhanced performance on complex, knowledge-dependent tasks. For neuroscience, this paper provides insights into how world knowledge might structurally shape neural activity patterns related to language comprehension and representation.

Several limitations and avenues for future work are acknowledged. The paper's computational demands are highlighted due to the extensive evaluation necessary across multiple models and dimensions. Additionally, the exploration of LLM-human alignment was constrained predominantly to language input tasks, suggesting that future work could expand to diverse cognitive tasks to better understand the correspondence between human neural processes and LLM representations.

In conclusion, the paper offers a significant contribution by demonstrating that instruction-tuning enhances the representational alignment of LLMs with human brain activity, primarily through improved world knowledge encoding. This work paves a meaningful path for future research at the intersection of computational LLMs and human neuroscience.