Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Global Context Mechanism for Sequence Labeling (2305.19928v5)

Published 31 May 2023 in cs.CL

Abstract: Global sentence information is crucial for sequence labeling tasks, where each word in a sentence must be assigned a label. While BiLSTM models are widely used, they often fail to capture sufficient global context for inner words. Previous work has proposed various RNN variants to integrate global sentence information into word representations. However, these approaches suffer from three key limitations: (1) they are slower in both inference and training compared to the original BiLSTM, (2) they cannot effectively supplement global information for transformer-based models, and (3) the high time cost associated with reimplementing and integrating these customized RNNs into existing architectures. In this study, we introduce a simple yet effective mechanism that addresses these limitations. Our approach efficiently supplements global sentence information for both BiLSTM and transformer-based models, with minimal degradation in inference and training speed, and is easily pluggable into current architectures. We demonstrate significant improvements in F1 scores across seven popular benchmarks, including Named Entity Recognition (NER) tasks such as Conll2003, Wnut2017 , and the Chinese named-entity recognition task Weibo, as well as End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) benchmarks such as Laptop14, Restaurant14, Restaurant15, and Restaurant16. With out any extra strategy, we achieve third highest score on weibo NER benchmark. Compared to CRF, one of the most popular frameworks for sequence labeling, our mechanism achieves competitive F1 scores while offering superior inference and training speed. Code is available at: https://github.com/conglei2XU/Global-Context-Mechanism

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. P.-H. Li, T.-J. Fu, and W.-Y. Ma, “Why attention? analyze bilstm deficiency and its remedies in the case of ner,” vol. 34, 2020, pp. 8236–8244.
  2. M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
  3. A. Ghaddar and P. Langlais, “Robust lexical features for improved neural network named-entity recognition,” arXiv preprint arXiv:1806.03489, 2018.
  4. X. Ma and E. Hovy, “End-to-end sequence labeling via bi-directional lstm-cnns-crf,” arXiv preprint arXiv:1603.01354, 2016.
  5. B. Plank, A. Søgaard, and Y. Goldberg, “Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss,” arXiv preprint arXiv:1604.05529, 2016.
  6. L. Chen, W. Ruan, X. Liu, and J. Lu, “Seqvat: Virtual adversarial training for semi-supervised sequence labeling,” 2020, pp. 8801–8811.
  7. L. Xu, Z. Jie, W. Lu, and L. Bing, “Better feature integration for named entity recognition,” arXiv preprint arXiv:2104.05316, 2021.
  8. X. Li, L. Bing, W. Zhang, and W. Lam, “Exploiting bert for end-to-end aspect-based sentiment analysis,” arXiv preprint arXiv:1910.00883, 2019.
  9. H. Lin, S. Zhang, Q. Li, Y. Li, J. Li, and Y. Yang, “A new method for heart rate prediction based on lstm-bilstm-att,” Measurement, vol. 207, p. 112384, 2023.
  10. M. Méndez, M. G. Merayo, and M. Núñez, “Long-term traffic flow forecasting using a hybrid cnn-bilstm model,” Engineering Applications of Artificial Intelligence, vol. 121, p. 106041, 2023.
  11. R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks,” arXiv preprint arXiv:1312.6026, 2013.
  12. F. Meng and J. Zhang, “Dtmt: A novel deep transition architecture for neural machine translation,” vol. 33, 2019, pp. 224–231.
  13. Y. Liu, F. Meng, J. Zhang, J. Xu, Y. Chen, and J. Zhou, “Gcdt: A global context enhanced deep transition architecture for sequence labeling,” arXiv preprint arXiv:1906.02437, 2019.
  14. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  15. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019.
  16. J. Sarzynska-Wawer, A. Wawer, A. Pawlak, J. Szymanowska, I. Stefaniak, M. Jarkiewicz, and L. Okruszek, “Detecting formal thought disorder by deep contextualized word representations,” Psychiatry Research, vol. 304, p. 114135, 2021.
  17. Z. Jie and W. Lu, “Dependency-guided lstm-crf for named entity recognition,” arXiv preprint arXiv:1909.10148, 2019.
  18. Y. Labrak and R. Dufour, “Antilles: An open french linguistically enriched part-of-speech corpus.”   Springer, 2022, pp. 28–38.
  19. A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, and R. Vollgraf, “Flair: An easy-to-use framework for state-of-the-art nlp,” 2019, pp. 54–59.
  20. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, pp. 1735–1780, 1997.
  21. H. Chen, Z. Lin, G. Ding, J. Lou, Y. Zhang, and B. Karlsson, “Grn: Gated relation network to enhance convolutional neural network for named entity recognition,” vol. 33, 2019, pp. 6236–6243.
  22. D. Zeng, Y. Dai, F. Li, J. Wang, and A. K. Sangaiah, “Aspect based sentiment analysis by a linguistically regularized cnn with gated mechanism,” Journal of Intelligent & Fuzzy Systems, vol. 36, pp. 3971–3980, 2019.
  23. J. Yuan, H.-C. Xiong, Y. Xiao, W. Guan, M. Wang, R. Hong, and Z.-Y. Li, “Gated cnn: Integrating multi-scale feature layers for object detection,” Pattern Recognition, vol. 105, p. 107131, 2020.
  24. X. Zeng, W. Ouyang, B. Yang, J. Yan, and X. Wang, “Gated bi-directional cnn for object detection.”   Springer, 2016, pp. 354–369.
  25. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  26. J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” 2014, pp. 1532–1543.
  27. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
  28. M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar, “SemEval-2014 task 4: Aspect based sentiment analysis,” in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014).   Dublin, Ireland: Association for Computational Linguistics, Aug. 2014, pp. 27–35. [Online]. Available: https://aclanthology.org/S14-2004
  29. M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, and I. Androutsopoulos, “Semeval-2015 task 12: Aspect based sentiment analysis,” 2015, pp. 486–495.
  30. M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, M. AL-Smadi, M. Al-Ayyoub, Y. Zhao, B. Qin, and O. D. Clercq, “Semeval-2016 task 5: Aspect based sentiment analysis.”   Association for Computational Linguistics, 2016, pp. 19–30.
  31. X. Li, L. Bing, P. Li, and W. Lam, “A unified model for opinion target extraction and target sentiment prediction,” vol. 33, 2019, pp. 6714–6721.
  32. E. F. Sang and F. D. Meulder, “Introduction to the conll-2003 shared task: Language-independent named entity recognition,” arXiv preprint cs/0306050, 2003.
  33. L. Derczynski, E. Nichols, M. V. Erp, and N. Limsopatham, “Results of the wnut2017 shared task on novel and emerging entity recognition,” 2017, pp. 140–147.
  34. N. Peng and M. Dredze, “Named entity recognition for chinese social media with jointly trained embeddings,” 2015, pp. 548–554.
  35. N. Silveira, T. Dozat, M.-C. D. Marneffe, S. R. Bowman, M. Connor, J. Bauer, and C. D. Manning, “A gold standard dependency corpus for english.”   Citeseer, 2014, pp. 2897–2904.
Citations (3)

Summary

We haven't generated a summary for this paper yet.