Emergent Mind

Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models

(2401.01301)
Published Jan 2, 2024 in cs.CL , cs.AI , and cs.CY

Abstract

LLMs have the potential to transform the practice of law, but this potential is threatened by the presence of legal hallucinations -- responses from these models that are not consistent with legal facts. We investigate the extent of these hallucinations using an original suite of legal queries, comparing LLMs' responses to structured legal metadata and examining their consistency. Our work makes four key contributions: (1) We develop a typology of legal hallucinations, providing a conceptual framework for future research in this area. (2) We find that legal hallucinations are alarmingly prevalent, occurring between 69% of the time with ChatGPT 3.5 and 88% with Llama 2, when these models are asked specific, verifiable questions about random federal court cases. (3) We illustrate that LLMs often fail to correct a user's incorrect legal assumptions in a contra-factual question setup. (4) We provide evidence that LLMs cannot always predict, or do not always know, when they are producing legal hallucinations. Taken together, these findings caution against the rapid and unsupervised integration of popular LLMs into legal tasks. Even experienced lawyers must remain wary of legal hallucinations, and the risks are highest for those who stand to benefit from LLMs the most -- pro se litigants or those without access to traditional legal resources.

Overview

  • LLMs have the potential to automate legal tasks but often generate 'legal hallucinations' or factually incorrect responses.

  • An examination showed a high rate of legal hallucinations in responses from models like ChatGPT and Llama, particularly for complex legal queries and lower court cases.

  • LLMs tend to reinforce incorrect legal premises posed by users, potentially misleading them.

  • The study found LLMs like Llama 2 to be poorly calibrated in recognizing their own hallucinations, expressing high confidence in incorrect responses.

  • The study calls for cautious LLM usage in legal settings, and emphasizes the need for continuous oversight by professionals.

Understanding Legal Hallucinations in AI

LLMs, like ChatGPT and others, hold promise for revolutionizing the legal industry by automating some tasks traditionally done by lawyers. However, as this study reveals, the road ahead is not without pitfalls. A pressing concern is the phenomenon of "legal hallucinations"—when these models generate responses inconsistent with legal facts.

The Extent of Legal Hallucinations

An extensive examination revealed that legal hallucinations occur alarmingly often. Law-specific queries to models like ChatGPT and Llama induced incorrect responses between 69% and 88% of the time. Interestingly, the occurrence of inaccurate information was connected to several factors ranging from the complexity of legal queries to the hierarchy of courts involved. For example, the frequency of hallucinations intensified for queries about lower court cases as opposed to the Supreme Court.

Models’ Response to Erroneous Legal Premises

Further complicating matters, LLMs displayed a troubling inclination to reinforce incorrect legal assumptions presented by users. When faced with questions built upon false legal premises, the models often failed to correct these assumptions and responded as if they were true, thus misleading users.

Predicting Hallucinations

Another layer to this challenge is the LLMs' ability to predict or be aware of their own hallucinations. In ideal circumstances, LLMs would be calibrated to recognize and convey when they are likely issuing a non-factual response. However, the study found that models, particularly Llama 2, were poorly calibrated, often expressing undue confidence in their hallucinated responses.

Implications for Legal Practice

The implications are significant. While the use of LLMs in legal settings presents opportunities for making legal advice more accessible, these technologies are not yet reliable enough to be used unsupervised, especially by those less versed in legal procedures. The research thus calls for cautious adoption of LLMs in the legal domain and emphasizes that even skilled attorneys need to remain vigilant while using these tools.

Future Directions for Research and Use

The study's findings underscore that combating legal hallucinations in LLMs is not only an empirical challenge but also a normative one. Developers must decide which contradictions to minimize—those of the training corpus, the user's inputs, or the external facts—and communicate these decisions clearly.

As a way forward, developers must make informed choices about how their models reconcile these inherent conflicts. Users, legal professionals, or otherwise, should be aware of these dynamics and deploy LLMs with a critical eye, constantly validating the accuracy and certainty of the generated legal texts. Until these challenges are fully addressed, the full potential of LLMs in augmenting legal research and democratizing access to justice remains unrealized.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.