LitLLM: A Toolkit for Scientific Literature Review (2402.01788v2)
Abstract: Conducting literature reviews for scientific papers is essential for understanding research, its limitations, and building on existing work. It is a tedious task which makes an automatic literature review generator appealing. Unfortunately, many existing works that generate such reviews using LLMs have significant limitations. They tend to hallucinate-generate non-factual information-and ignore the latest research they have not been trained on. To address these limitations, we propose a toolkit that operates on Retrieval Augmented Generation (RAG) principles, specialized prompting and instructing techniques with the help of LLMs. Our system first initiates a web search to retrieve relevant papers by summarizing user-provided abstracts into keywords using an off-the-shelf LLM. Authors can enhance the search by supplementing it with relevant papers or keywords, contributing to a tailored retrieval process. Second, the system re-ranks the retrieved papers based on the user-provided abstract. Finally, the related work section is generated based on the re-ranked results and the abstract. There is a substantial reduction in time and effort for literature review compared to traditional methods, establishing our toolkit as an efficient alternative. Our project page including the demo and toolkit can be accessed here: https://litLLM.github.io
- Gradio: Hassle-free sharing and testing of ml models in the wild. arXiv preprint arXiv:1906.02569.
- LLMs for Literature Review generation: Are we there yet? Under submission.
- Exploring the boundaries of reality: Investigating the phenomenon of artificial intelligence hallucination in scientific writing through chatgpt references. Cureus, 15.
- Language models are few-shot learners.
- Agent instructs large language models to be general zero-shot reasoners. ArXiv, abs/2310.03710.
- Editeval: An instruction-based benchmark for text improvements. arXiv preprint arXiv:2209.13331.
- Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
- Michael Haman and Milan Školník. 2023. Using chatgpt to conduct a literature review. Accountability in Research, pages 1–3.
- Large language models are zero-shot rankers for recommender systems. ArXiv, abs/2305.08845.
- Jingshan Huang and Ming Tan. 2023. The role of chatgpt in scientific communication: writing better scientific review articles. American Journal of Cancer Research, 13(4):1148.
- A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232.
- Retrieval-augmented controllable review generation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2284–2295.
- The semantic scholar open data platform. arXiv preprint arXiv:2301.10140.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- Do dall-e and flamingo understand each other? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1999–2010.
- Self-prompting large language models for zero-shot open-domain qa.
- The dawn after the dark: An empirical study on factuality hallucination in large language models. arXiv preprint arXiv:2401.03205.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
- Retrieval-augmented generation for code summarization via hybrid gnn. arXiv preprint arXiv:2006.05405.
- Llm comparative assessment: Zero-shot nlg evaluation through pairwise comparisons using large language models.
- S2ORC: The semantic scholar open research corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4969–4983, Online. Association for Computational Linguistics.
- Multi-XScience: A large-scale dataset for extreme multi-document summarization of scientific articles. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8068–8074. Association for Computational Linguistics.
- Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156.
- Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601.
- Rankvicuna: Zero-shot listwise document reranking with open-source large language models. arXiv preprint arXiv:2309.15088.
- Rankzephyr: Effective and robust zero-shot listwise reranking is a breeze! arXiv preprint arXiv:2312.02724.
- Guanghui Qin and Jason Eisner. 2021. Learning how to ask: Querying LMs with mixtures of soft prompts. arXiv preprint arXiv:2104.06599.
- Dynamic modality interaction modeling for image-text retrieval. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.
- Ehud Reiter and Robert Dale. 1997. Building applied natural language generation systems. Natural Language Engineering, 3(1):57–87.
- Prompt space optimizing few-shot reasoning success with large language models. ArXiv, abs/2306.03799.
- Wit: Wikipedia-based image text dataset for multimodal multilingual machine learning. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.
- Trainable sentence planning for complex information presentations in spoken dialog systems. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 79–86, Barcelona, Spain.
- Is chatgpt good at search? investigating large language models as re-ranking agent. arXiv preprint arXiv:2304.09542.
- Galactica: A large language model for science. arXiv preprint arXiv:2211.09085.
- Efficient retrieval augmented generation from unstructured knowledge for task-oriented dialog. arXiv preprint arXiv:2102.04643.
- A comprehensive survey of hallucination mitigation techniques in large language models. arXiv preprint arXiv:2401.01313.
- Finetuned language models are zero-shot learners. ArXiv, abs/2109.01652.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
- Coca: Contrastive captioners are image-text foundation models. Trans. Mach. Learn. Res., 2022.
- Rank-without-gpt: Building gpt-independent listwise rerankers on open-source large language models. arXiv preprint arXiv:2312.02969.
- Mamo: Fine-grained vision-language representations learning with masked multimodal modeling. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.
- Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107.