PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding (2206.02096v2)

Published 5 Jun 2022 in cs.LG

Abstract: We are now witnessing significant progress of deep learning methods in a variety of tasks (or datasets) of proteins. However, there is a lack of a standard benchmark to evaluate the performance of different methods, which hinders the progress of deep learning in this field. In this paper, we propose such a benchmark called PEER, a comprehensive and multi-task benchmark for Protein sEquence undERstanding. PEER provides a set of diverse protein understanding tasks including protein function prediction, protein localization prediction, protein structure prediction, protein-protein interaction prediction, and protein-ligand interaction prediction. We evaluate different types of sequence-based methods for each task including traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein LLMs. In addition, we also investigate the performance of these methods under the multi-task learning setting. Experimental results show that large-scale pre-trained protein LLMs achieve the best performance for most individual tasks, and jointly training multiple tasks further boosts the performance. The datasets and source codes of this benchmark are all available at https://github.com/DeepGraphLearning/PEER_Benchmark

Citations (75)

View on Semantic Scholar

Summary

The paper presents the PEER benchmark that systematically evaluates protein sequence understanding across five key biological tasks.
The paper compares traditional methods with modern deep learning models, showing that pre-trained models like ESM-1b consistently outperform older techniques.
The paper demonstrates that multi-task learning and auxiliary tasks enhance model generalization, potentially accelerating advancements in protein engineering and drug discovery.

Comprehensive Evaluation of Protein Sequence Understanding: The PEER Benchmark

The paper "PEER: A Comprehensive and Multi-task Benchmark for Protein Sequence Understanding" presents a meticulous framework to address the heterogeneity faced in protein-related machine learning tasks. Given the explosive growth in protein sequence data, with repositories like UniProt containing upwards of 200 million entries, the absence of standardized evaluation criteria stymies the comparative analysis of myriad computational approaches. This work proposes the PEER benchmark to fill this critical gap by systematically evaluating and comparing different methods across a multitude of protein sequence analysis tasks.

Summary of the PEER Benchmark

PEER stands out as a multifaceted benchmark, encompassing tasks across five principal categories essential for protein sequence understanding. These tasks include:

Protein Function Prediction: Encompassing fluorescence, stability, activity, and solubility predictions.
Protein Localization Prediction: Involves subcellular localization and binary localization tasks.
Protein Structure Prediction: Featuring contact prediction, fold classification, and secondary structure prediction.
Protein-Protein Interaction Prediction: This section focuses on yeast and human PPI predictions along with the prediction of PPI affinity.
Protein-Ligand Interaction Prediction: This is framed through affinity prediction tasks using datasets like PDBbind and BindingDB.

The benchmark evaluates a spectrum of sequence-based methodologies, from traditional feature engineering approaches like DDE and Moran correlation to advanced deep learning models such as bidirectional LSTM, CNN, and Transformers. A critical highlight is the consideration of modern large-scale pre-trained protein LLMs, specifically ProtBert and ESM-1b, which have shown superior performance in the benchmark evaluations.

Experimental Outcomes

The paper's experimental results reveal several noteworthy trends. Firstly, pre-trained protein LLMs consistently outperform traditional models across most individual tasks, underscoring their ability to leverage extensive pre-training on large sequence corpuses. In particular, ESM-1b demonstrates substantial efficacy, achieving top performance in a majority of the tasks when used either as a feature extractor or with fine-tuning. Moreover, the exploration of multi-task learning (MTL) shows promise in enhancing model performance. MTL, especially when incorporating contact prediction as an auxiliary task, improves the generalization of more complex models like ESM-1b while simultaneously demonstrating the potential of knowledge sharing across related tasks.

Implications and Future Directions

The inception of the PEER benchmark marks a definitive step toward standardizing model evaluation in protein sequence understanding. The comprehensive range of tasks includes both classification and regression challenges, covering broad functional and interaction landscapes of proteins. The insights garnered from PEER can facilitate model selection tailored for specific tasks, thus accelerating advancements in protein-related research areas, including drug discovery and protein engineering.

Looking forward, the benchmark sets a precedent for extending evaluations to structure-based approaches, potentially integrating 3D structural data and multi-sequence alignments. Furthermore, the continued inclusion of diverse tasks, such as those involving more intricate biological annotations or novel protein functions, is anticipated to enhance its utility. Collaboration with the broader scientific community remains crucial to refining PEER and fostering innovations in protein understanding methodologies.

In conclusion, PEER provides a robust framework for benchmarking and is poised to inspire a new phase of research in computational biology, where machine learning methodologies can be effectively benchmarked for cutting-edge protein sequence analysis.

Related Papers

GitHub

GitHub - DeepGraphLearning/PEER_Benchmark: PEER Benchmark, appear at NeurIPS 2022 Dataset and Benchmark Track (https://arxiv.org/abs/2206.02096) (90 stars)