Empirical Analysis for Unsupervised Universal Dependency Parse Tree Aggregation (2403.19183v2)
Abstract: Dependency parsing is an essential task in NLP, and the quality of dependency parsers is crucial for many downstream tasks. Parsers' quality often varies depending on the domain and the language involved. Therefore, it is essential to combat the issue of varying quality to achieve stable performance. In various NLP tasks, aggregation methods are used for post-processing aggregation and have been shown to combat the issue of varying quality. However, aggregation methods for post-processing aggregation have not been sufficiently studied in dependency parsing tasks. In an extensive empirical study, we compare different unsupervised post-processing aggregation methods to identify the most suitable dependency tree structure aggregation method.
- Olaf RP Bininda-Emonds. 2004. Phylogenetic supertrees: combining information to reveal the tree of life, volume 4. Springer Science & Business Media.
- David Bryant. 2003. A classifica of co sensus methods for phylogenetics. In Bioconsensus: DIMACS Working Group Meetings on Bioconsensus: October 25-26, 2000 and October 2-5, 2001, DIMACS Center, volume 61, page 163. American Mathematical Soc.
- Open-domain aspect-opinion co-mining with double-layer span extraction. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 66–75.
- Towards better ud parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation. CoNLL 2018, page 55.
- Comparing the value of labeled and unlabeled data in method-of-moments latent variable estimation. In International Conference on Artificial Intelligence and Statistics, pages 3286–3294. PMLR.
- Glenn De’Ath. 2007. Boosted trees for ecological modeling and prediction. Ecology, 88(1):243–251.
- Fǎnicǎ Gavril. 1987. Generating the maximum spanning trees of a weighted graph. Journal of Algorithms, 8(4):592–597.
- Turku neural parser pipeline: An end-to-end system for the conll 2018 shared task. CoNLL 2018, page 133.
- Firebolt: Weak supervision under weaker assumptions. In International Conference on Artificial Intelligence and Statistics, pages 8214–8259. PMLR.
- Cptam: Constituency parse tree aggregation method. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pages 630–638. SIAM.
- Distilling an ensemble of greedy dependency parsers into one mst parser. In EMNLP.
- Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1187–1198.
- Sex bist: A multi-source trainable parser with deep contextualized lexical representations. CoNLL 2018, page 143.
- Adversarial multi class learning under weak supervision with performance guarantees. In International Conference on Machine Learning, pages 7534–7543. PMLR.
- Semi-supervised aggregation of dependent weak supervision sources with performance guarantees. In International Conference on Artificial Intelligence and Statistics, pages 3196–3204. PMLR.
- Joakim Nivre and Ryan McDonald. 2008. Integrating graph-based and transition-based dependency parsers. In Proceedings of ACL-08: HLT, pages 950–958.
- Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3):e1301.
- Training complex models with multi-task weak supervision. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, pages 4763–4771.
- Snorkel: Fast training set generation for information extraction. In Proceedings of the 2017 ACM international conference on management of data, pages 1683–1686.
- Data programming: Creating large training sets, quickly. Advances in neural information processing systems, 29.
- High-dimensional ising model selection using l1-regularized logistic regression. The Annals of Statistics, 38(3):1287–1319.
- Truth discovery in sequence labels from crowds. In 2021 IEEE International Conference on Data Mining (ICDM), pages 539–548. IEEE Computer Society.
- Kenji Sagae and Alon Lavie. 2006. Parser combination by reparsing. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 129–132.
- Milan Straka. 2018. Udpipe 2.0 prototype at conll 2018 ud shared task. CoNLL 2018, page 197.
- Mihai Surdeanu and Christopher D Manning. 2010. Ensemble models for dependency parsing: cheap and good? In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 649–652.
- Dependency-driven relation extraction with attentive graph convolutional networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4458–4471.
- UDapter: Language adaptation for truly Universal Dependency parsing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2302–2315, Online. Association for Computational Linguistics.
- Multi-layer pseudo-Siamese biaffine model for dependency parsing. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5476–5487, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Conll 2018 shared task: Multilingual parsing from raw text to universal dependencies. In Proceedings of the CoNLL 2018 Shared Task: Multilingual parsing from raw text to universal dependencies, pages 1–21.