Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval (2403.18405v2)
Abstract: Determining which legal cases are relevant to a given query involves navigating lengthy texts and applying nuanced legal reasoning. Traditionally, this task has demanded significant time and domain expertise to identify key Legal Facts and reach sound juridical conclusions. In addition, existing data with legal case similarities often lack interpretability, making it difficult to understand the rationale behind relevance judgments. With the growing capabilities of LLMs, researchers have begun investigating their potential in this domain. Nonetheless, the method of employing a general LLM for reliable relevance judgments in legal case retrieval remains largely unexplored. To address this gap in research, we propose a novel few-shot approach where LLMs assist in generating expert-aligned interpretable relevance judgments. The proposed approach decomposes the judgment process into several stages, mimicking the workflow of human annotators and allowing for the flexible incorporation of expert reasoning to improve the accuracy of relevance judgments. Importantly, it also ensures interpretable data labeling, providing transparency and clarity in the relevance assessment process. Through a comparison of relevance judgments made by LLMs and human experts, we empirically demonstrate that the proposed approach can yield reliable and valid relevance assessments. Furthermore, we demonstrate that with minimal expert supervision, our approach enables a LLM to acquire case analysis expertise and subsequently transfers this ability to a smaller model via annotation-based knowledge distillation.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).
- LEGAL-BERT: The muppets straight out of law school. arXiv preprint arXiv:2010.02559 (2020).
- Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092 (2023).
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Perspectives on Large Language Models for Relevance Judgment. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR ’23). ACM. https://doi.org/10.1145/3578337.3605136
- Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056 (2023).
- GPT2: Empirical slant delay model for radio space geodetic techniques. Geophysical research letters 40, 6 (2013), 1069–1073.
- SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval. arXiv preprint arXiv:2304.11370 (2023).
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- Meng Yutong Liu Hongcheng, Liao Yusheng and Wang Yuhao. 2023. LawGPT: 中文法律对话语言模型. (2023). https://github.com/LiuHC0428/LAW_GPT
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7
- LeCaRD: a legal case retrieval dataset for Chinese law system. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2342–2348.
- Do we still need human assessors? prompt-based gpt-3 user simulation in conversational ai. In Proceedings of the 4th Conference on Conversational User Interfaces. 1–6.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
- Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94. Springer, 232–241.
- UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers. arXiv preprint arXiv:2303.00807 (2023).
- BERT-PLI: Modeling Paragraph-Level Interactions for Legal Case Retrieval.. In IJCAI. 3501–3507.
- Understanding Relevance Judgments in Legal Case Retrieval. ACM Transactions on Information Systems 41, 3 (2023), 1–32.
- Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement. arXiv:2310.18440 [cs.CL]
- Large language models can accurately predict searcher preferences. arXiv:2309.10621 [cs.IR]
- LLaMA: Open and Efficient Foundation Language Models. http://arxiv.org/abs/2302.13971 cite arxiv:2302.13971.
- Semantic data augmentation based distance metric learning for domain generalization. In Proceedings of the 30th ACM International Conference on Multimedia. 3214–3223.
- Promda: Prompt-based data augmentation for low-resource nlu tasks. arXiv preprint arXiv:2202.12499 (2022).
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Lawformer: A pre-trained language model for chinese legal long documents. AI Open 2 (2021), 79–84.