Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 30 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 12 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 184 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

LLMs as Meta-Reviewers' Assistants: A Case Study (2402.15589v2)

Published 23 Feb 2024 in cs.CL, cs.AI, cs.LG, and cs.NE

Abstract: One of the most important yet onerous tasks in the academic peer-reviewing process is composing meta-reviews, which involves assimilating diverse opinions from multiple expert peers, formulating one's self-judgment as a senior expert, and then summarizing all these perspectives into a concise holistic overview to make an overall recommendation. This process is time-consuming and can be compromised by human factors like fatigue, inconsistency, missing tiny details, etc. Given the latest major developments in LLMs, it is very compelling to rigorously study whether LLMs can help metareviewers perform this important task better. In this paper, we perform a case study with three popular LLMs, i.e., GPT-3.5, LLaMA2, and PaLM2, to assist meta-reviewers in better comprehending multiple experts perspectives by generating a controlled multi-perspective summary (MPS) of their opinions. To achieve this, we prompt three LLMs with different types/levels of prompts based on the recently proposed TELeR taxonomy. Finally, we perform a detailed qualitative study of the MPSs generated by the LLMs and report our findings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Revisiting automatic evaluation of extractive summarization task: Can we do better than rouge? In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 1547–1560. Association for Computational Linguistics.
  2. Mousumi Akter and Shubhra Kanti Karmaker Santu. 2023a. Fans: a facet-based narrative similarity metric. CoRR, abs/2309.04823.
  3. Mousumi Akter and Shubhra Kanti Karmaker Santu. 2023b. Redundancy aware multi-reference based gainwise evaluation of extractive summarization. CoRR, abs/2308.02270.
  4. Learning to generate overlap summaries through noisy synthetic data. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 11765–11777. Association for Computational Linguistics.
  5. SEM-F1: an automatic way for semantic evaluation of multi-narrative overlap summaries at scale. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 780–792. Association for Computational Linguistics.
  6. Semantic overlap summarization among multiple alternative narratives: An exploratory study. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 6195–6207. International Committee on Computational Linguistics.
  7. Language models are few-shot learners.
  8. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  9. Glam: Efficient scaling of language models with mixture-of-experts.
  10. Michael Fire and Carlos Guestrin. 2019. Over-optimization of academic publishing metrics: Observing goodhart’s law in action. GigaScience, 8.
  11. Ptr: Prompt tuning with rules for text classification. AI Open, 3:182–192.
  12. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
  13. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  14. Shubhra Kanti Karmaker Santu and Dongji Feng. 2023. Teler: A general taxonomy of llm prompts for benchmarking complex tasks.
  15. SOFSAT: towards a setlike operator based framework for semantic analysis of text. SIGKDD Explor., 20(2):21–30.
  16. Exploring universal sentence encoders for zero-shot text classification. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2022 - Volume 2: Short Papers, Online only, November 20-23, 2022, pages 135–147. Association for Computational Linguistics.
  17. Zero-shot multi-label topic inference with sentence encoders and llms. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 16218–16233. Association for Computational Linguistics.
  18. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  19. Lamda: Language models for dialog applications.
  20. Llama: Open and efficient foundation language models.
  21. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  22. Exploring the limits of chatgpt for query or aspect-based text summarization. arXiv preprint arXiv:2302.08081.
Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.