Can Large Language Models Learn Independent Causal Mechanisms? (2402.02636v2)

Published 4 Feb 2024 in cs.CL, cs.AI, cs.IT, cs.LG, and math.IT

Abstract: Despite impressive performance on LLMling and complex reasoning tasks, LLMs fall short on the same tasks in uncommon settings or with distribution shifts, exhibiting a lack of generalisation ability. By contrast, systems such as causal models, that learn abstract variables and causal relationships, can demonstrate increased robustness against changes in the distribution. One reason for this success is the existence and use of Independent Causal Mechanisms (ICMs) representing high-level concepts that only sparsely interact. In this work, we apply two concepts from causality to learn ICMs within LLMs. We develop a new LLM architecture composed of multiple sparsely interacting LLMling modules. We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks. We also investigate the level of independence and domain specialisation and show that LLMs rely on pre-trained partially domain-invariant mechanisms resilient to fine-tuning.

Authors (5)

Gaël Gendron (14 papers)
Bao Trung Nguyen (1 paper)
Alex Yuxuan Peng (6 papers)
Michael Witbrock (48 papers)
Gillian Dobbie (21 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Can Large Language Models Learn Independent Causal Mechanisms? (2402.02636v2)

Summary

Related Papers