Uni-SMART: Universal Science Multimodal Analysis and Research Transformer (2403.10301v2)
Abstract: In scientific research and its application, scientific literature analysis is crucial as it allows researchers to build on the work of others. However, the fast growth of scientific knowledge has led to a massive increase in scholarly articles, making in-depth literature analysis increasingly challenging and time-consuming. The emergence of LLMs has offered a new way to address this challenge. Known for their strong abilities in summarizing texts, LLMs are seen as a potential tool to improve the analysis of scientific literature. However, existing LLMs have their own limits. Scientific literature often includes a wide range of multimodal elements, such as tables, charts, and molecule, which are hard for text-focused LLMs to understand and analyze. This issue points to the urgent need for new solutions that can fully understand and analyze multimodal content in scientific literature. To answer this demand, we present \textbf{Uni-SMART} (Universal Science Multimodal Analysis and Research Transformer), an innovative model designed for in-depth understanding of multimodal scientific literature. Through rigorous quantitative evaluation across several domains, Uni-SMART demonstrates superior performance over other text-focused LLMs. Furthermore, our exploration extends to practical applications, including patent infringement detection and nuanced analysis of charts. These applications not only highlight Uni-SMART's adaptability but also its potential to revolutionize how we interact with scientific literature.
- Challenges and advances in information extraction from scientific literature: a review. JOM, 73(11):3383–3400, 2021.
- Information extraction from scientific articles: a survey. Scientometrics, 117:1931–1990, 2018.
- Stephen Walter Gabrielson. Scifinder. Journal of the Medical Library Association: JMLA, 106(4):588, 2018.
- Jonathan Goodman. Computer software review: Reaxys, 2009.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023.
- Gemini Team Google. Gemini: A family of highly capable multimodal models. CoRR, abs/2312.11805, 2023.
- Language models are few-shot learners. In NeurIPS, 2020.
- Training language models to follow instructions with human feedback. In NeurIPS, 2022.
- OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023.
- Sciassess: Benchmarking llm proficiency in scientific literature analysis, 2024.
- Tablex: a benchmark dataset for structure and content information extraction from scientific tables. In Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part II 16, pages 554–569. Springer, 2021.
- A framework for information extraction from tables in biomedical literature. International Journal on Document Analysis and Recognition (IJDAR), 22:55–78, 2019.
- Extraction and interpretation of charts in technical documents. In 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pages 382–387. IEEE, 2013.
- Information extraction in molecular biology. Briefings in Bioinformatics, 3(2):154–165, 2002.
- Chemdataextractor: a toolkit for automated extraction of chemical information from the scientific literature. Journal of chemical information and modeling, 56(10):1894–1904, 2016.
- Automated chemical reaction extraction from scientific literature. Journal of chemical information and modeling, 62(9):2035–2045, 2021.
- Srikumaran Melethil. Patent issues in drug development: perspectives of a pharmaceutical scientist-attorney. The AAPS journal, 7:E723–E727, 2005.