Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 48 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 473 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Span-based Joint Entity and Relation Extraction with Transformer Pre-training (1909.07755v4)

Published 17 Sep 2019 in cs.CL and cs.LG

Abstract: We introduce SpERT, an attention model for span-based joint entity and relation extraction. Our key contribution is a light-weight reasoning on BERT embeddings, which features entity recognition and filtering, as well as relation classification with a localized, marker-free context representation. The model is trained using strong within-sentence negative samples, which are efficiently extracted in a single BERT pass. These aspects facilitate a search over all spans in the sentence. In ablation studies, we demonstrate the benefits of pre-training, strong negative sampling and localized context. Our model outperforms prior work by up to 2.6% F1 score on several datasets for joint entity and relation extraction.

Citations (358)

Summary

  • The paper introduces a span-based joint model leveraging BERT embeddings to efficiently extract entities and relations with localized context.
  • It integrates span classification with a filtering mechanism and relation classification based on single-pass BERT for overlapping entity detection.
  • Experimental results show significant F1 score improvements on CoNLL04, SciERC, and ADE datasets, highlighting the impact of strong negative sampling.

SpERT: Span-based Joint Entity and Relation Extraction with Transformer Pre-training

The paper introduces SpERT, a span-based joint entity and relation extraction model leveraging BERT embeddings. It focuses on lightweight reasoning on BERT embeddings, which facilitates entity recognition and filtering, and relation classification with localized context representation. The model is trained using strong within-sentence negative samples extracted in a single BERT pass. The authors demonstrate that SpERT outperforms existing models on several datasets for joint entity and relation extraction.

Model Architecture and Implementation

SpERT adopts a span-based approach where each token subsequence represents a potential entity, and relations can exist between any span pair. The model performs a full search over all possible spans, enabling the identification of overlapping entities. The architecture consists of three main components:

  1. Span Classification: Each span is classified into predefined entity types or a none class, utilizing a span representation that combines BERT embeddings, width embeddings, and a classifier token representing the overall sentence context. The span representation e(s)\mathbf{e}(s) is computed as:

    e(s)=f(ei,ei+1,...,ei+k)wk+1\mathbf{e}(s) = f(\mathbf{e}_i, \mathbf{e}_{i+1}, ..., \mathbf{e}_{i+k}) \circ \mathbf{w}_{k+1}

    where ff is a fusion function (max-pooling), ei\mathbf{e}_i are BERT embeddings, wk+1\mathbf{w}_{k+1} is a width embedding, and \circ denotes concatenation. The final input to the span classifier is:

    xs=e(s)    c\mathbf{x}^s = \mathbf{e}(s) \; \circ \; \mathbf{c}

    where c\mathbf{c} is the classifier token. This input is fed into a softmax classifier:

    ys^=softmax(Wsxs+bs)\hat{\mathbf{y}^s} = \text{softmax}\Big( W^s \cdot \mathbf{x}^s + \mathbf{b}^s \Big)

    where WsW^s and bs\mathbf{b}^s are learned parameters.

  2. Span Filtering: Spans classified as none are filtered out, resulting in a set of spans that are considered potential entities.
  3. Relation Classification: Each entity pair is processed to determine if a predefined relation exists between them. The input consists of the entity representations and a localized context representation, which is obtained by max-pooling the BERT embeddings of the tokens between the two entities. The localized context c(s1,s2)\mathbf{c}(s_1, s_2) is specific to the entity pair (s1,s2)(s_1, s_2). The inputs to the relation classifier are:

    x1r=e(s1)c(s1,s2)e(s2) x2r=e(s2)c(s1,s2)e(s1). \begin{aligned} \mathbf{x}_1^r & = \mathbf{e}(s_1) \, \circ \, \mathbf{c}(s_1,s_2) \, \circ \, \mathbf{e}(s_2) \ \mathbf{x}_2^r & = \mathbf{e}(s_2) \, \circ \, \mathbf{c}(s_1,s_2) \, \circ \, \mathbf{e}(s_1). \ \end{aligned}

    Both x1r\mathbf{x}_1^r and x2r\mathbf{x}_2^r are passed through a single-layer classifier:

    y1/2r^=σ(Wrx1/2r+br)\hat{\mathbf{y}^r_{1/2}} = \sigma\Big( W^r \cdot \mathbf{x}^r_{1/2} + \mathbf{b}^r \Big)

    where σ\sigma is a sigmoid function and WrW^r and br\mathbf{b}^r are learned parameters.

Training Methodology

The model is trained using a joint loss function that combines the span classifier's cross-entropy loss and the relation classifier's binary cross-entropy loss. The training process involves:

  • Utilizing labeled entities as positive samples and a fixed number of random non-entity spans as negative samples for the span classifier.
  • Using ground truth relations as positive samples and drawing negative samples from entity pairs not labeled with any relation for the relation classifier.

Training examples are sampled per sentence, and each sentence is processed only once through BERT, which speeds up training.

Experimental Results and Analysis

SpERT was evaluated on the CoNLL04, SciERC, and ADE datasets, consistently outperforming state-of-the-art models in both entity and relation extraction.

Dataset Metric SpERT Prior Work Improvement
CoNLL04 F1 71.47 68.9 2.6%
SciERC F1 50.84 48.4 2.4%
ADE F1 79.24 77.29 2.0%

Ablation studies demonstrated the importance of pre-training, strong negative sampling, and localized context. Using a localized context representation significantly outperformed using the full sentence context, particularly for longer sentences. The number of negative samples also had a significant impact on performance.

Conclusion

SpERT presents a span-based approach for joint entity and relation extraction that leverages BERT. The model's performance benefits from strong negative sampling, span filtering, and localized context representation. The results suggest that span-based approaches are competitive with BILOU-based models and may be more promising for future research due to their ability to identify overlapping entities. Future work may focus on more elaborate context representations and incorporating syntactic features.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.