A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications (1804.09635v1)

Published 25 Apr 2018 in cs.CL and cs.AI

Abstract: Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer reviews available for research purposes (PeerRead v1) providing an opportunity to study this important artifact. The dataset consists of 14.7K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR. The dataset also includes 10.7K textual peer reviews written by experts for a subset of the papers. We describe the data collection process and report interesting observed phenomena in the peer reviews. We also propose two novel NLP tasks based on this dataset and provide simple baseline models. In the first task, we show that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline. In the second task, we predict the numerical scores of review aspects and show that simple models can outperform the mean baseline for aspects with high variance such as 'originality' and 'impact'.

Citations (179)

View on Semantic Scholar

Summary

The paper introduces the PeerRead dataset containing 14.7K paper drafts and 10.7K expert reviews to facilitate quantitative analysis of the peer review process.
The study employs a rigorous data collection method that reveals strong correlations between review aspects like clarity and originality with acceptance recommendations.
The authors demonstrate NLP applications by developing models for acceptance prediction and aspect score estimation, improving baseline accuracy by up to 22%.

Insights into Peer Reviewing and NLP Applications

The paper "A Dataset of Peer Reviews (P): Collection, Insights and NLP Applications" by Kang et al. presents a substantial contribution to the intersection of scientific peer reviewing and NLP. The aim of the paper is to not only document the peer review process but also to explore its manifold applications through the introduction of a novel dataset: PeerReview Dataset (P). The dataset, publicly available, comprises 14.7K paper drafts with corresponding accept/reject decisions from top-tier conferences and 10.7K expert-written textual peer reviews, enabling advanced quantitative analysis by the research community.

Dataset Composition and Collection

The creation of PeerReview Dataset (P) is detailed with a rigorous data collection methodology, including collaboration with conference management systems to allow papers and reviews to be systematically opted-in. The dataset aggregates peer reviews across conferences such as ACL, NIPS, and ICLR, and utilizes sources like OpenReview and arXiv to annotate submissions automatically based on acceptance history. This dataset provides extensive opportunities for in-depth analysis of peer reviewing trends while offering robust data to the NLP community.

Analysis of Peer Reviews

Quantitative analysis conducted on the dataset reveals compelling insights into peer reviewing behaviors. Notably, the correlations between aspect scores (such as clarity and originality) and overall recommendation scores highlight the evaluative criteria highly valued by reviewers. Substance and clarity are shown to have higher correlations with recommendation scores, suggesting these elements as key determinants in acceptance decisions. These findings raise pertinent discussions on existing peer review biases and the underlying expectations for scientific contributions.

Moreover, the analysis comparing oral versus poster presentation recommendations within the dataset indicates that a higher overall recommendation score often correlates with an oral presentation suggestion, discerning holistic strength in submissions warranting oral dissemination.

NLP Tasks Derived from Peer Reviews

The paper proposes two NLP tasks utilizing the dataset: predicting paper acceptance and predicting review aspect scores. These tasks challenge the traditional sentiment analysis and text generation paradigms by focusing on structured judgment synthesis from peer reviews. In the acceptance prediction task, a variety of models including logistic regression, SVM, and random forests, applied with extensive feature engineering, show an accuracy improvement of up to 22% over baseline models. Features like abstract length and recent citations are particularly influential, hinting at reviewer biases that may be systematically analyzed and addressed.

The aspect score prediction task utilizes neural models with contextual conditioning on paper reviews to predict scores with improved precision on aspects with high variance. This task demonstrates the potential for developing better-calibrated review tools with extensive training data available within PeerReview Dataset (P).

Implications and Future Directions

While PeerReview Dataset (P) opens avenues for enhancing the peer review process and provides insightful analyses of existing practices, its potential is far from exhausted. The integration of findings with automated review systems could streamline submission evaluation and reduce reviewer workload. Furthermore, the exploration of demographic biases and sentiment-based synthesized review generation could promote egalitarian scientific communication channels.

The presented PeerReview Dataset (P) thus establishes a structured resource for ongoing research in the domain of scholarly communication, offering numerous implications for refining peer review strategies and advancing NLP methodologies within academic circles. Future endeavors could focus on leveraging deep learning models and advanced textual analytics to further dissect the multilayered dynamics of scientific peer reviewing.

PDF Markdown