Instruction-tuning Aligns LLMs to the Human Brain (2312.00575v2)
Abstract: Instruction-tuning is a widely adopted finetuning method that enables LLMs to generate output that more closely resembles human responses. However, no studies have shown that instruction-tuning actually teaches LLMs to process language in a similar manner as humans. We investigate the effect of instruction-tuning on aligning LLM and human language processing mechanisms in two ways: (1) brain alignment, the similarity of LLM internal representations to neural activity in the human language system, and (2) behavioral alignment, the similarity of LLM and human behavior on a reading task. We assess 25 vanilla and instruction-tuned LLMs on three datasets involving humans reading naturalistic stories and sentences, and find that instruction-tuning generally enhances brain alignment (~6%), but has no similar effect on behavioral alignment. To identify factors underlying this improvement in brain alignment, we compute correlations between brain alignment and various LLM properties, such as model size, problem-solving, and world knowledge understanding. Notably, we find a strong positive correlation between brain alignment and model size (r = 0.95), as well as performance on tasks requiring world knowledge (r = 0.81). Our results demonstrate that instruction-tuning LLMs improves both world knowledge representations and brain alignment, suggesting that the mechanisms that encode world knowledge in LLMs also improve representational alignment to the human brain.
- A Review on Language Models as Knowledge Bases, April 2022. URL http://arxiv.org/abs/2204.06031. arXiv:2204.06031 [cs].
- Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo. https://github.com/nomic-ai/gpt4all, 2023.
- Scaling laws for language encoding models in fMRI, May 2023. URL http://arxiv.org/abs/2305.11863. arXiv:2305.11863 [cs].
- Training language models to summarize narratives improves brain alignment, February 2023. URL http://arxiv.org/abs/2212.10898. arXiv:2212.10898 [cs, q-bio].
- A functional dissociation between language and multiple-demand systems revealed in patterns of BOLD signal fluctuations. Journal of Neurophysiology, 112(5):1105–1118, September 2014. ISSN 0022-3077, 1522-1598. doi: 10.1152/jn.00884.2013. URL https://www.physiology.org/doi/10.1152/jn.00884.2013.
- Comet: Commonsense transformers for automatic knowledge graph construction. In Annual Meeting of the Association for Computational Linguistics, 2019. URL https://api.semanticscholar.org/CorpusID:189762527.
- Brains and algorithms partially converge in natural language processing. Communications Biology, 5(1):134, February 2022. ISSN 2399-3642. doi: 10.1038/s42003-022-03036-1. URL https://www.nature.com/articles/s42003-022-03036-1.
- INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models, June 2023. URL http://arxiv.org/abs/2306.04757. arXiv:2306.04757 [cs].
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
- Scaling Instruction-Finetuned Language Models, December 2022. URL http://arxiv.org/abs/2210.11416. arXiv:2210.11416 [cs].
- Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations. In Neural Information Processing Systems (NeurIPS), June 2020. doi: 10.1101/2020.06.16.154542.
- Aligning Model and Macaque Inferior Temporal Cortex Representations Improves Model-to-Human Behavioral Alignment and Adversarial Robustness. preprint, Neuroscience, July 2022. URL http://biorxiv.org/lookup/doi/10.1101/2022.07.01.498495.
- Language models show human-like content effects on reasoning, July 2022. URL http://arxiv.org/abs/2207.07051. arXiv:2207.07051 [cs].
- Interpreting multimodal video transformers using brain recordings. In ICLR 2023 Workshop on Multimodal Representation Learning: Perks and Pitfalls, 2023. URL https://openreview.net/forum?id=p-vL3rmYoqh.
- Artificial neural network language models predict human brain responses to language even after a developmentally realistic amount of training. bioRxiv, pp. 2022.10.04.510681, January 2023. doi: 10.1101/2022.10.04.510681. URL http://biorxiv.org/content/early/2023/09/19/2022.10.04.510681.abstract.
- Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20(6):641–655, December 1981. ISSN 00225371. doi: 10.1016/S0022-5371(81)90220-6. URL https://linkinghub.elsevier.com/retrieve/pii/S0022537181902206.
- The natural stories corpus. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 2018. European Language Resources Association (ELRA). URL https://aclanthology.org/L18-1012.
- Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3):369–380, March 2022. ISSN 1097-6256, 1546-1726. doi: 10.1038/s41593-022-01026-4. URL https://www.nature.com/articles/s41593-022-01026-4.
- John Hale. A probabilistic earley parser as a psycholinguistic model. In Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001 - NAACL ’01, pp. 1–8, Pittsburgh, Pennsylvania, 2001. Association for Computational Linguistics. doi: 10.3115/1073336.1073357. URL http://portal.acm.org/citation.cfm?doid=1073336.1073357.
- Measuring Massive Multitask Language Understanding, January 2021. URL http://arxiv.org/abs/2009.03300. arXiv:2009.03300 [cs].
- Comprehension of computer code relies primarily on domain-general executive brain regions. eLife, 9:e58906, December 2020. ISSN 2050-084X. doi: 10.7554/eLife.58906. URL https://elifesciences.org/articles/58906.
- Incorporating Context into Language Encoding Models for fMRI. preprint, Neuroscience, May 2018. URL http://biorxiv.org/lookup/doi/10.1101/327601.
- Language models and brain alignment: beyond word-level semantics and prediction, December 2022. URL http://arxiv.org/abs/2212.00596. arXiv:2212.00596 [cs, q-bio].
- Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times? Transactions of the Association for Computational Linguistics, 11:336–350, March 2023. ISSN 2307-387X. doi: 10.1162/tacl˙a˙00548. URL https://doi.org/10.1162/tacl_a_00548.
- Comparison of Structural Parsers and Neural Language Models as Surprisal Estimators. Frontiers in Artificial Intelligence, 5:777963, March 2022. ISSN 2624-8212. doi: 10.3389/frai.2022.777963. URL https://www.frontiersin.org/articles/10.3389/frai.2022.777963/full.
- Neural Language Taskonomy: Which NLP Tasks are the most Predictive of fMRI Brain Activity? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3220–3237, Seattle, United States, 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.235. URL https://aclanthology.org/2022.naacl-main.235.
- Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey), July 2023. URL http://arxiv.org/abs/2307.10246. arXiv:2307.10246 [cs, q-bio].
- Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155, 2022. URL https://api.semanticscholar.org/CorpusID:246426909.
- Toward a universal decoder of linguistic meaning from brain activation. Nature Communications, 9(1):963, March 2018. ISSN 2041-1723. doi: 10.1038/s41467-018-03068-4. URL https://www.nature.com/articles/s41467-018-03068-4.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, July 2020. URL http://arxiv.org/abs/1910.10683. arXiv:1910.10683 [cs, stat].
- Harry Potter and the Sorcerer’s Stone. Harry Potter. A.A. Levine Books, 1998. ISBN 9780590353403. URL https://books.google.de/books?id=zXgTdQagLGkC.
- Towards robust vision by multi-task learning on monkey visual cortex. Advances in Neural Information Processing Systems, 34:739–751, 2021.
- Personality Traits in Large Language Models, June 2023. URL http://arxiv.org/abs/2307.00184. arXiv:2307.00184 [cs].
- Commonsense Reasoning for Natural Language Processing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pp. 27–33, Online, 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-tutorials.7. URL https://www.aclweb.org/anthology/2020.acl-tutorials.7.
- Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? preprint, Neuroscience, September 2018. URL http://biorxiv.org/lookup/doi/10.1101/407007.
- Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence. Neuron, 2020. ISSN 0896-6273. doi: 10.1016/j.neuron.2020.07.040.
- The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45):e2105646118, November 2021. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.2105646118. URL https://pnas.org/doi/full/10.1073/pnas.2105646118.
- Inducing brain-relevant bias in natural language processing models, October 2019. URL http://arxiv.org/abs/1911.03268. arXiv:1911.03268 [cs, q-bio].
- The effect of word predictability on reading time is logarithmic. Cognition, 128(3):302–319, 2013. ISSN 0010-0277. doi: https://doi.org/10.1016/j.cognition.2013.02.013. URL https://www.sciencedirect.com/science/article/pii/S0010027713000413.
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them, October 2022. URL http://arxiv.org/abs/2210.09261. arXiv:2210.09261 [cs].
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
- Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain), November 2019. URL http://arxiv.org/abs/1905.11833. arXiv:1905.11833 [cs, q-bio].
- LLaMA: Open and Efficient Foundation Language Models, February 2023. URL http://arxiv.org/abs/2302.13971. arXiv:2302.13971 [cs].
- Self-instruct: Aligning language model with self generated instructions, 2022a.
- Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks, October 2022b. URL http://arxiv.org/abs/2204.07705. arXiv:2204.07705 [cs].
- Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 5085–5109, Abu Dhabi, United Arab Emirates, December 2022c. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.340. URL https://aclanthology.org/2022.emnlp-main.340.
- Simultaneously Uncovering the Patterns of Brain Regions Involved in Different Story Reading Subprocesses. PLoS ONE, 9(11):e112575, November 2014. ISSN 1932-6203. doi: 10.1371/journal.pone.0112575. URL https://dx.plos.org/10.1371/journal.pone.0112575.
- On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior, June 2020. URL http://arxiv.org/abs/2006.01912. arXiv:2006.01912 [cs].
- Instruction Tuning for Large Language Models: A Survey, August 2023. URL http://arxiv.org/abs/2308.10792. arXiv:2308.10792 [cs].