Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly Specialized Domain Expertise? (2306.13906v1)
Abstract: We evaluated the capability of generative pre-trained transformers~(GPT-4) in analysis of textual data in tasks that require highly specialized domain expertise. Specifically, we focused on the task of analyzing court opinions to interpret legal concepts. We found that GPT-4, prompted with annotation guidelines, performs on par with well-trained law student annotators. We observed that, with a relatively minor decrease in performance, GPT-4 can perform batch predictions leading to significant cost reductions. However, employing chain-of-thought prompting did not lead to noticeably improved performance on this task. Further, we demonstrated how to analyze GPT-4's predictions to identify and mitigate deficiencies in annotation guidelines, and subsequently improve the performance of the model. Finally, we observed that the model is quite brittle, as small formatting related changes in the prompt had a high impact on the predictions. These findings can be leveraged by researchers and practitioners who engage in semantic/pragmatic annotations of texts in the context of the tasks requiring highly specialized domain expertise.
- Sentence boundary detection in adjudicatory decisions in the united states, Traitement automatique des langues 58 (2017) 21.
- Chain of thought prompting elicits reasoning in large language models, arXiv preprint arXiv:2201.11903 (2022).
- R. Artstein, M. Poesio, Inter-coder agreement for computational linguistics, Computational linguistics 34 (2008) 555–596.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing, ACM Computing Surveys 55 (2023) 1–35.
- Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech, in: Companion Proceedings of the ACM Web Conference 2023, 2023, pp. 294–297. URL: http://arxiv.org/abs/2302.07736. doi:10.1145/3543873.3587368, arXiv:2302.07736 [cs].
- M. Bommarito, D. M. Katz, Gpt takes the bar exam, arXiv preprint arXiv:2212.14402 (2022).
- Gpt-4 passes the bar exam, Available at SSRN 4389233 (2023).
- J. Goodhue, Y. Wei, Classification of trademark distinctiveness using openai gpt 3.5 model, Available at SSRN 4351998 (2023).
- Can gpt-3 perform statutory reasoning?, arXiv preprint arXiv:2302.06100 (2023).
- How well do sota legal reasoning models support abductive reasoning?, arXiv preprint arXiv:2304.06912 (2023).
- Explaining legal concepts with augmented large language models (gpt-4), in: AI4Legs 2023: AI for Legislation, 2023.
- S. Hamilton, Blind judgement: Agent-based supreme court modelling with gpt, arXiv preprint arXiv:2301.05327 (2023).
- Chatgpt as an artificial lawyer?, in: Artificial Intelligence for Access to Justice (AI4AJ 2023), 2023.
- J. Savelka, Unlocking practical applications in legal domain: Evaluation of gpt for zero-shot semantic annotation of legal texts, arXiv preprint arXiv:2305.04417 (2023).
- Llmediator: Gpt-4 assisted online dispute resolution, in: Artificial Intelligence for Access to Justice (AI4AJ 2023), 2023.
- Computer-assisted creation of boolean search rules for text classification in the legal domain., in: JURIX, 2019, pp. 123–132.
- Sentence embeddings and high-speed similarity search for fast computer assisted annotation of legal documents, in: Legal Knowledge and Information Systems: JURIX 2020: The Thirty-third Annual Conference, Brno, Czech Republic, December 9-11, 2020, volume 334, IOS Press, 2020, p. 164.
- Applying an interactive machine learning approach to statutory analysis, in: Legal Knowledge and Information Systems, IOS Press, 2015, pp. 101–110.
- Classifying legal norms with active machine learning., in: JURIX, 2017, pp. 11–20.
- G. V. Cormack, M. R. Grossman, Scalability of continuous active learning for reliable high-recall text classification, in: Proceedings of the 25th ACM international on conference on information and knowledge management, 2016, pp. 1039–1048.
- G. V. Cormack, M. R. Grossman, Autonomy and reliability of continuous active learning for technology-assisted review, arXiv preprint arXiv:1504.06868 (2015).
- Human-aided computer cognition for e-discovery, in: Proceedings of the 12th International Conference on Artificial Intelligence and Law, 2009, pp. 194–201.
- J. Šavelka, K. D. Ashley, Discovering explanatory sentences in legal case decisions using pre-trained language models, in: Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 4273–4283.
- J. Savelka, K. D. Ashley, On the role of past treatment of terms from written laws in legal reasoning, New Developments in Legal Reasoning and Logic: From Ancient Law to Modern Legal Systems (2022) 379–395.
- J. Šavelka, K. D. Ashley, Extracting case law sentences for argumentation about the meaning of statutory terms, in: Proceedings of the third workshop on argument mining (ArgMining2016), 2016, pp. 50–59.
- Improving sentence retrieval from case law for statutory interpretation, in: Proceedings of the seventeenth international conference on artificial intelligence and law, 2019, pp. 113–122.
- J. Savelka, K. D. Ashley, Learning to rank sentences for explaining statutory terms., in: ASAIL@ JURIX, 2020.
- J. Šavelka, K. D. Ashley, Legal information retrieval for understanding statutory terms, Artificial Intelligence and Law (2021) 1–45.
- K. Krippendorff, Computing krippendorff’s alpha-reliability (2011).
- Improving language understanding by generative pre-training (2018).
- Attention is all you need, Advances in neural information processing systems 30 (2017).
- Language models are unsupervised multitask learners (2019).
- Language models are few-shot learners, Advances in neural information processing systems 33 (2020) 1877–1901.