Emergent Mind

Fine-tune BERT for Extractive Summarization

(1903.10318)
Published Mar 25, 2019 in cs.CL

Abstract

BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at https://github.com/nlpyang/BertSum

Overview

  • The paper introduces BERTSUM, a variant of BERT adapted for extractive summarization, and analyzes its effectiveness.

  • BERTSUM adds sentence-level [CLS] tokens and interval segment embeddings to handle extractive summarization tasks.

  • Experiments on the CNN/Dailymail and New York Times datasets show BERTSUM outperforms other models, especially with Transformer layers.

  • Structural improvements to BERT and post-processing steps like Trigram Blocking are key to enhancing summarization performance.

Introduction

The paper under discussion investigates the viability of fine-tuning BERT, a pre-trained Transformer model, for the task of extractive summarization. Extractive summarization is the process of producing a concise version of a document by selecting and concatenating its most salient sentences. The paper introduces BERTSUM, a variant of BERT tailored to this task, and its performance is evaluated against other state-of-the-art systems using ROUGE metrics.

Methodology

BERT, by design, is adept at handling tasks with a lot of contextual information due to its pre-training on sizable datasets. However, BERT’s architecture necessitates modification for the task at hand. The authors resolve this by introducing sentence-level [CLS] tokens and interval segment embeddings to adapt BERT's token-oriented nature to sentence-oriented extractive summarization. The paper outlines summarization layers built on top of BERT's outputs, designed to capture document-level features. Specific attention is paid to three variants: a simple classifier, inter-sentence Transformer, and an LSTM-based approach.

Experiments and Results

A range of experiments were conducted on two well-known datasets: CNN/Dailymail and the New York Times Annotated Corpus. Noteworthy is the fact that the BERTSUM with Transformer layers significantly outperformed previously established models, increasing the previous best ROUGE-L score by 1.65 points. It was found that using two Transformer layers atop BERT was particularly effective. Furthermore, simplicity proved effective too, as the use of LSTM layers did not contribute substantially to the overall summarization performance.

Additions like interval segment embeddings provided performance boosts, while the application of Trigram Blocking further refined the output by reducing content redundancy in the generated summaries. This indicates that both the structural refinements to BERT and the post-processing steps are instrumental in pushing the envelope in extractive summarization tasks.

Conclusion

The research demonstrates that BERT, with appropriate modifications and fine-tuning, sets a new benchmark in extractive summarization. The findings underscore the impact of leveraging a powerful pre-trained model like BERT, and enhancing it with targeted alterations tailored to the specific needs of the summarization endeavor. The success of BERTSUM with Transformer layers, in particular, affirms the value of this strategy, setting the stage for future research to explore and expand upon these modifications. The implementation details and code made available by the authors ensure that the research community can replicate and build upon these promising results.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.