Large language models surpass human experts in predicting neuroscience results (2403.03230v4)

Published 4 Mar 2024 in q-bio.NC and cs.AI

Abstract: Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. LLMs offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.

References (4)

Citations (9)

View on Semantic Scholar

Summary

The paper reveals that LLMs outperform human experts in predicting neuroscience outcomes with an 81.4% accuracy rate.
The study introduces the BrainBench benchmark, which evaluates forward-thinking predictions by comparing altered neuroscience abstracts.
The work highlights future directions for integrating AI with human expertise to accelerate scientific discovery in neuroscience.

LLMs Outperform Human Experts in Predicting Neuroscience Results

Introduction

Recent advancements in artificial intelligence, specifically within the domain of LLMs, have ushered in a new era of potential for scientific discovery. LLMs, propelled by architectures such as the transformer model, harbor the ability to process and understand vast amounts of textual data at an unprecedented scale. The paper in question leverages this capability to address a particularly challenging and pertinent question: Can LLMs surpass human experts in predicting the outcomes of scientific experiments? Specifically, it focuses on the field of neuroscience, launching an investigation with the newly devised forward-looking benchmark, BrainBench.

BrainBench: A Novel Forward-looking Benchmark

The crux of the paper’s methodology lies in the development of BrainBench, a benchmark designed to evaluate the ability of both human experts and LLMs to predict outcomes of neuroscience research. Distinct from traditional backward-looking question-and-answer formats, BrainBench emphasizes forward-looking predictions, making it uniquely suited to assess the generative hypothesis and outcome prediction capabilities of LLMs. Participants, both LLMs and neuroscientists, were tasked with identifying the correct version of altered neuroscience abstracts, thereby testing their predictive prowess.

Results Overview

The results unequivocally demonstrate that LLMs, on average, outperform human experts across the board in BrainBench. With a prediction accuracy averaging 81.4% for LLMs compared to 63.4% for human experts, the data presents a compelling case for the superior predictive abilities of LLMs in the domain of neuroscience. Interestingly, the paper further explores the breakdown by subfields, participant types, and model sizes, consistently finding LLMs leading, irrespective of these factors.

Implications and Future Directions

The findings of this paper signal a significant shift in how scientific predictions and hypotheses could be generated in the future. The human-LLM collaboration framework it suggests could potentially accelerate discovery processes across various scientific fields. By augmenting human expertise with LLMs' vast, integrative processing capabilities, the scientific community could reach novel insights at a pace previously unattainable. Moreover, the paper’s exploration into model fine-tuning via Low-Rank Adaptation (LoRA) sheds light on the efficiency and efficacy of specializing general-purpose LLMs for domain-specific endeavors, showcasing a path towards continuously updating models to assimilate new scientific knowledge.

Conclusion

In conclusively showcasing that LLMs can outperform human experts in predicting the results of neuroscience experiments, the paper sets a foundational precedent for the broader application of LLMs in scientific research. Refraining from sensationalism, it offers a sober, fact-based assessment of LLMs' capabilities, underlined by strong numerical evidence. The future it hints at is one where the synergy between human intellect and artificial intelligence paves the way for a new epoch of scientific discovery, driven by the predictive power of LLMs. As research continues to evolve, the methodologies and insights from this paper will undoubtedly serve as critical reference points, informing the development and deployment of AI in scientific research pipelines.

Related Papers

Tweets

https://twitter.com/emollick/status/1766198580803436601

https://twitter.com/ProfData/status/1765689739682754824

https://twitter.com/ProfData/status/1770422043063804252

https://twitter.com/KirkegaardEmil/status/1766115294034960631

https://twitter.com/ProfData/status/1767967526640615899

https://twitter.com/johnjhorton/status/1765828348142408072

YouTube

Show All Videos

HackerNews

Large language models surpass human experts in predicting neuroscience results (2 points, 0 comments)