Interesting Scientific Idea Generation using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders (2405.17044v3)
Abstract: The rapid growth of scientific literature makes it challenging for researchers to identify novel and impactful ideas, especially across disciplines. Modern AI systems offer new approaches, potentially inspiring ideas not conceived by humans alone. But how compelling are these AI-generated ideas, and how can we improve their quality? Here, we introduce SciMuse, which uses 58 million research papers and a large-LLM to generate research ideas. We conduct a large-scale evaluation in which over 100 research group leaders -- from natural sciences to humanities -- ranked more than 4,400 personalized ideas based on their interest. This data allows us to predict research interest using (1) supervised neural networks trained on human evaluations, and (2) unsupervised zero-shot ranking with large-LLMs. Our results demonstrate how future systems can help generating compelling research ideas and foster unforeseen interdisciplinary collaborations.
- D. Wang and A.-L. Barabási, The science of science (Cambridge University Press, 2021).
- L. Bornmann, R. Haunschild, and R. Mutz, Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases, Humanities and Social Sciences Communications 8, 1 (2021).
- J. A. Evans and J. G. Foster, Metaknowledge, Science 331, 721 (2011).
- M. Krenn and A. Zeilinger, Predicting research trends with semantic and neural networks with an application in quantum physics, Proc. Natl. Acad. Sci. USA 117, 1910 (2020).
- F. Shi and J. Evans, Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines, Nature Communications 14, 1641 (2023).
- J. Sourati and J. A. Evans, Accelerating science with human-aware artificial intelligence, Nature Human Behaviour 7, 1682 (2023).
- X. Gu and M. Krenn, Forecasting high-impact research topics via machine learning on evolving knowledge graphs, arXiv:2402.08640 (2024).
- M. R. AI4Science and M. A. Quantum, The impact of large language models on scientific discovery: a preliminary study using gpt-4, arXiv:2311.07361 (2023).
- E. Commission, Eurostat gisco - nuts geodata (2024).
- R. Hooke, A spot in one of the belts of jupiter, Philosophical Transactions of the Royal Society of London 1, 3 (1665).
- A.-L. Barabási, Network Science (Cambridge University Press, 2016).
- T. Fawcett, Roc graphs: Notes and practical considerations for researchers, Machine learning 31, 1 (2004).
- M. AI, Llama 3: Open foundation and fine-tuned chat models, https://github.com/meta-llama/llama3 (2024).