LEGAL-BERT: The Muppets straight out of Law School (2010.02559v1)

Published 6 Oct 2020 in cs.CL

Abstract: BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.

Citations (217)

View on Semantic Scholar

Summary

The paper demonstrates that further pre-training and training from scratch yield significantly improved performance on legal NLP tasks compared to baseline BERT models.
The study expands the hyperparameter search space, offering critical insights into optimal fine-tuning practices for specialized legal language.
The release of efficient, smaller LEGAL-BERT models provides competitive alternatives that reduce computational demands for practical legal applications.

Adapting BERT for the Legal Domain: An Examination of Domain-Specific Pre-training

Introduction to LEGAL-BERT

The paper presents a comprehensive evaluation of how BERT, a pre-eminent model in NLP tasks, can be adapted for the legal domain through specialized pre-training strategies. By assessing various approaches to apply BERT models to downstream legal tasks across multiple datasets, the authors highlight significant findings on domain adaptation, ultimately introducing LEGAL-BERT—a suite of BERT models tailored for legal NLP research and applications.

Overview of Strategies and Methodologies

The research delineates three main strategies for adapting BERT to specialized domains such as the legal field:

BERT Out of the Box: Utilizing the original BERT model without any additional pre-training.
Further Pre-trained BERT (LEGAL-BERT-FP): Extending the pre-training of BERT with domain-specific corpora.
BERT Pre-trained from Scratch (LEGAL-BERT-SC): Initiating pre-training afresh on domain-specific corpora.

A new angle explored in this paper is the broadening of the hyperparameter search space beyond the commonly used settings suggested in the original BERT literature for fine-tuning on downstream tasks. Moreover, smaller, lightweight versions of BERT models are investigated to examine their efficiency and competitive performance in the specialized domain.

Key Findings and Contributions

The paper's empirical results reveal that both specialized pre-training strategies (FP and SC) significantly outperform the baseline, using BERT directly out of the box, across multiple legal datasets. This underscores the necessity of domain-specific adaptation for maximizing performance in specialized fields.
An expanded hyperparameter search space leads to substantially improved outcomes, suggesting that the default tuning guidelines may not be optimal for domain-specific applications. This insight prompts a reassessment of fine-tuning practices in specialized NLP tasks.
It was found that smaller BERT-based models can indeed compete with larger ones in specific domains, which is particularly reassuring for applications constrained by computational resources. LEGAL-BERT-SMALL emerges as an efficient alternative that retains effectiveness while reducing computational demands.
The release of LEGAL-BERT models marks a significant contribution to the field of legal NLP, providing researchers and practitioners with robust, domain-adapted tools for a variety of legal applications, from textual analysis to legal judgment prediction.

Implications and Future Directions

This paper reinforces the narrative that one-size-fits-all pre-training may not be efficiently applicable across different domains, particularly in areas with specialized lexicons and stylistic constructs like law. The LEGAL-BERT models not only set new performance benchmarks for legal NLP tasks but also pave the way for further exploration into domain-specific model optimization. Future research avenues could include deeper dives into the granularity of legal sub-domains or the development of even more resource-efficient models capable of running on edge devices.

The comprehensive analysis and methodologies outlined in this paper offer critical insights for advancing the application of transformers in domain-specific settings. By leveraging LEGAL-BERT for both theoretical exploration and practical applications in legal technology, the NLP community can significantly enhance the accuracy and efficiency of AI-powered legal analysis and decision-making systems.

Related Papers

Tweets

https://twitter.com/VikashS73164257/status/1834040545619947556