CCPL: Cross-modal Contrastive Protein Learning (2303.11783v2)

Published 19 Mar 2023 in q-bio.BM, cs.AI, and cs.LG

Abstract: Effective protein representation learning is crucial for predicting protein functions. Traditional methods often pretrain protein LLMs on large, unlabeled amino acid sequences, followed by finetuning on labeled data. While effective, these methods underutilize the potential of protein structures, which are vital for function determination. Common structural representation techniques rely heavily on annotated data, limiting their generalizability. Moreover, structural pretraining methods, similar to natural language pretraining, can distort actual protein structures. In this work, we introduce a novel unsupervised protein structure representation pretraining method, cross-modal contrastive protein learning (CCPL). CCPL leverages a robust protein LLM and uses unsupervised contrastive alignment to enhance structure learning, incorporating self-supervised structural constraints to maintain intrinsic structural information. We evaluated our model across various benchmarks, demonstrating the framework's superiority.

Citations (4)

View on Semantic Scholar