Event GDR: Event-Centric Generative Document Retrieval (2405.06886v1)
Abstract: Generative document retrieval, an emerging paradigm in information retrieval, learns to build connections between documents and identifiers within a single model, garnering significant attention. However, there are still two challenges: (1) neglecting inner-content correlation during document representation; (2) lacking explicit semantic structure during identifier construction. Nonetheless, events have enriched relations and well-defined taxonomy, which could facilitate addressing the above two challenges. Inspired by this, we propose Event GDR, an event-centric generative document retrieval model, integrating event knowledge into this task. Specifically, we utilize an exchange-then-reflection method based on multi-agents for event knowledge extraction. For document representation, we employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation. For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure. Our method achieves significant improvement over the baselines on two datasets, and also hopes to provide insights for future research.
- Understanding Differential Search Index for Text Retrieval. In Findings of the ACL. 10701–10717.
- Improving Factuality and Reasoning in Language Models through Multiagent Debate. arXiv:2305.14325 [cs.CL]
- Trigger-Argument based Explanation for Event Detection. In Findings of ACL. 5046–5058.
- Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the EMNLP. 6769–6781.
- Multiview Identifiers Enhanced Generative Retrieval. In Proceedings of the ACL. 6636–6648.
- Generation-Augmented Retrieval for Open-Domain Question Answering. In Proceedings of the ACL. 4089–4100.
- How Does Generative Retrieval Scale to Millions of Passages? arXiv preprint arXiv:2305.11841 (2023).
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21, 1 (2020).
- Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies. In Proceedings of the SIGKDD.
- Transformer memory as a differentiable search index. In Proceedings of the NeurIPS 35 (2022), 21831–21843.
- MAVEN: A Massive General Domain Event Detection Dataset. In Proceedings of the EMNLP. 1652–1671.
- A neural corpus indexer for document retrieval. In Proceedings of the NeurIPS 35 (2022).
- mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In Proceedings of the NAACL. 483–498.
- ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL]
- Bridging the gap between indexing and retrieval for differentiable search index with query generation. arXiv preprint arXiv:2206.10128 (2022).
- Yong Guan (18 papers)
- Dingxiao Liu (1 paper)
- Jinchen Ma (1 paper)
- Hao Peng (291 papers)
- Xiaozhi Wang (51 papers)
- Lei Hou (127 papers)
- Ru Li (23 papers)