Large Language Models as Zero-Shot Keyphrase Extractors: A Preliminary Empirical Study (2312.15156v2)
Abstract: Zero-shot keyphrase extraction aims to build a keyphrase extractor without training by human-annotated data, which is challenging due to the limited human intervention involved. Challenging but worthwhile, zero-shot setting efficiently reduces the time and effort that data labeling takes. Recent efforts on pre-trained LLMs (e.g., ChatGPT and ChatGLM) show promising performance on zero-shot settings, thus inspiring us to explore prompt-based methods. In this paper, we ask whether strong keyphrase extraction models can be constructed by directly prompting the LLM ChatGPT. Through experimental results, it is found that ChatGPT still has a lot of room for improvement in the keyphrase extraction task compared to existing state-of-the-art unsupervised and supervised models.
- Large language models are few-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998–2022, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. CoRR, abs/2302.04023.
- Simple unsupervised keyphrase extraction using sentence embeddings. In CoNLL, pages 221–229. Association for Computational Linguistics.
- Florian Boudin. 2018. Unsupervised keyphrase extraction with multipartite graphs. In NAACL-HLT (2), pages 667–672. Association for Computational Linguistics.
- Topicrank: Graph-based topic ranking for keyphrase extraction. In IJCNLP, pages 543–551. Asian Federation of Natural Language Processing / ACL.
- Language models are few-shot learners. NIPS’20, Red Hook, NY, USA. Curran Associates Inc.
- Yake! collection-independent automatic keyword extractor. In ECIR, volume 10772 of Lecture Notes in Computer Science, pages 806–810. Springer.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1), pages 4171–4186. Association for Computational Linguistics.
- Haoran Ding and Xiao Luo. 2021. Attentionrank: Unsupervised keyphrase extraction using self and cross attentions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1919–1928.
- Haoran Ding and Xiao Luo. 2022. AGRank: Augmented graph-based unsupervised keyphrase extraction. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 230–239, Online only. Association for Computational Linguistics.
- KALM: Knowledge-aware integration of local, document, and global contexts for long document understanding. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 2116–2138. Association for Computational Linguistics.
- Corina Florescu and Cornelia Caragea. 2017. Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents. In ACL (1), pages 1105–1115. Association for Computational Linguistics.
- Kazi Saidul Hasan and Vincent Ng. 2014. Automatic keyphrase extraction: A survey of the state of the art. In ACL (1), pages 1262–1273. The Association for Computer Linguistics.
- Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In EMNLP.
- Karen Spärck Jones. 2004. A statistical interpretation of term specificity and its application in retrieval. J. Documentation, 60(5):493–502.
- Semeval-2010 task 5 : Automatic keyphrase extraction from scientific articles. In SemEval@ACL, pages 21–26. The Association for Computer Linguistics.
- PromptRank: Unsupervised keyphrase extraction using prompt. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9788–9801, Toronto, Canada. Association for Computational Linguistics.
- HiPool: Modeling long documents using graph neural networks. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 161–171. Association for Computational Linguistics.
- Generate, filter, and fuse: Query expansion via multi-step keyword generation for zero-shot neural rankers.
- Unsupervised summarization by jointly extracting sentences and keywords.
- Unsupervised keyphrase extraction by jointly modeling local and global context. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 155–164, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- LDKP - A dataset for identifying keyphrases from long scientific documents. In Proceedings of the Workshop on Deep Learning for Search and Recommendation (DL4SR 2022) co-located with the 31st ACM International Conference on Information and Knowledge Management (CIKM 2022), volume 3317 of CEUR Workshop Proceedings. CEUR-WS.org.
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In EMNLP, pages 404–411. ACL.
- Thuy Dung Nguyen and Min-Yen Kan. 2007. Keyphrase extraction in scientific publications. In ICADL, volume 4822 of Lecture Notes in Computer Science, pages 317–326. Springer.
- Training language models to follow instructions with human feedback.
- Incorporating distributions of discourse structure for long document abstractive summarization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 5574–5590. Association for Computational Linguistics.
- Zero-shot audio captioning with audio-language model guidance and audio context keywords.
- Keygames: A game theoretic approach to automatic keyphrase extraction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2037–2048.
- Hyperbolic relevance matching for neural keyphrase extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 5710–5720. Association for Computational Linguistics.
- A preliminary exploration of extractive multi-document summarization in hyperbolic space. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, pages 4505–4509. ACM.
- Utilizing BERT intermediate layers for unsupervised keyphrase extraction. In 5th International Conference on Natural Language and Speech Processing, ICNLSP 2022, Trento, Italy, December 16-17, 2022, pages 277–281. Association for Computational Linguistics.
- Hisum: Hyperbolic interaction model for extractive multi-document summarization. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pages 1427–1436. ACM.
- A survey on recent advances in keyphrase extraction from pre-trained language models. In Findings of the Association for Computational Linguistics: EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 2108–2119. Association for Computational Linguistics.
- Unsupervised keyphrase extraction by learning neural keyphrase set function. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2482–2494, Toronto, Canada. Association for Computational Linguistics.
- Is chatgpt a good keyphrase generator? a preliminary study.
- Hybrid summarization with semantic weighting reward and latent structure detector. In Proceedings of The 13th Asian Conference on Machine Learning, volume 157 of Proceedings of Machine Learning Research, pages 1739–1754. PMLR.
- Importance Estimation from Multiple Perspectives for Keyphrase Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2726–2736, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Improving embedding-based unsupervised keyphrase extraction by incorporating structural information. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1041–1048, Toronto, Canada. Association for Computational Linguistics.
- HyperRank: Hyperbolic ranking model for unsupervised keyphrase extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16070–16080, Singapore. Association for Computational Linguistics.
- Improving diversity in unsupervised keyphrase extraction with determinantal point process. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, page 4294–4299. Association for Computing Machinery.
- Learning to extract from multiple perspectives for neural keyphrase extraction. Comput. Speech Lang., 81:101502.
- Mitigating over-generation for unsupervised keyphrase extraction with heterogeneous centrality detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16349–16359, Singapore. Association for Computational Linguistics.
- Capturing global informativeness in open domain keyphrase extraction. In CCF International Conference on Natural Language Processing and Chinese Computing, pages 275–287. Springer.
- Sifrank: A new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access, 8:10896–10906.
- Semantic similarity measure of natural language text through machine learning and a <scp>keyword-aware cross-encoder-ranking</scp> summarizer—a case study using <scp>ucgis gis</scp>&t body of knowledge. Transactions in GIS, 27(4):1068–1089.
- Xiaojun Wan and Jianguo Xiao. 2008. Single document keyphrase extraction using neighborhood knowledge. In AAAI, pages 855–860. AAAI Press.
- Zero-shot information extraction via chatting with chatgpt. ArXiv, abs/2302.10205.
- Open domain web keyphrase extraction beyond language modeling. In EMNLP/IJCNLP (1), pages 5174–5183. Association for Computational Linguistics.
- MDERank: A masked document embedding rank approach for unsupervised keyphrase extraction. In Findings of the Association for Computational Linguistics: ACL 2022, pages 396–409, Dublin, Ireland. Association for Computational Linguistics.
- Mingyang Song (29 papers)
- Xuelian Geng (1 paper)
- Songfang Yao (2 papers)
- Shilong Lu (2 papers)
- Yi Feng (101 papers)
- Liping Jing (33 papers)