Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier (1705.05919v1)

Published 15 May 2017 in q-bio.GN, cs.LG, and q-bio.QM

Abstract: A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40,000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, with significant improvement for predicting cellular locations.

Citations (365)

Summary

  • The paper introduces a deep ontology-aware classifier that integrates CNN-based sequence analysis with protein interaction networks for effective protein function prediction.
  • The paper demonstrates improved performance with an F_max of 0.64 over BLAST benchmarks, particularly in annotating cellular component functions.
  • The paper offers a scalable framework that expedites high-throughput functional annotation and lays groundwork for future expansions to other biological ontologies.

DeepGO: Integrating Sequence and Interaction Networks for Protein Function Prediction

The paper "DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier" presents an innovative method for predicting protein functions by leveraging deep learning techniques. The primary challenge addressed is the large-scale, multi-class, and multi-label nature of protein function prediction, as defined by the Gene Ontology (GO) with over 40,000 classes. The sheer volume of novel protein sequences made available by high-throughput sequencing technologies poses significant hurdles for experimental functional characterization.

Methodological Advancements

DeepGO employs a neural network architecture that encodes both protein sequences and protein-protein interaction networks, optimizing function prediction across the GO's biologically hierarchical structure. This model encompasses two pivotal components:

  1. Feature Learning: Utilizing Convolutional Neural Networks (CNNs), the model learns feature representations from amino acid sequences. This is complemented by protein-protein interaction networks embedded into a joint representation space through knowledge graph embeddings, reflecting inter-species orthologous relations.
  2. Ontological Structure Awareness: DeepGO models dependencies in GO hierarchies, discerning interrelations among different functional classes. This hierarchical layout refines predictions through recursive neural computations that incorporate parent-child class relationships, enhancing performance across the ontology.

Evaluation and Results

DeepGO was evaluated using the Computational Assessment of Function Annotation (CAFA) benchmarks, demonstrating notable improvement over baseline methods like BLAST, especially in predicting cellular component locations. The performance was measured using protein-centric and term-centric metrics, with results indicating substantial gains in the Ontologies of Cellular Component (CC) and Molecular Function (MF).

  • Performance Metrics: Metrics such as the maximum F-measure (FmaxF_{max}) and ROC AUC were used to ascertain the efficiency of predictions. The DeepGO model, leveraging both sequence and interaction data, outperformed BLAST, particularly in the CC ontology with FmaxF_{max} of 0.64.

Domain Implications

The implications of DeepGO are extensive in both computational biology and bioinformatics sectors. Practically, this approach expedites protein characterization processes, allowing biologists to hypothesize functions for novel proteins in various organisms efficiently. Theoretically, it presents a framework that can be adapted for other ontologically structured problems, such as predicting gene-disease associations using the Disease Ontology.

Future Developments

Looking forward, expanding DeepGO's applicability could involve integrating additional biological data, such as transcriptional co-expression, regulatory networks, or even larger sets of protein-protein interactions. Moreover, incorporating more sophisticated representations of GO's part-of and regulatory relations could further enhance predictive accuracy.

Further research can explore improving the model's ability to predict lower-abundance functions by penalizing false negatives in a context-aware manner, aligning predictions with the biological importance of specific protein functions. Additionally, adapting the framework to address emerging biological data types and ontologies could catalyze advancements in computational function annotation tools.

In summary, DeepGO paves a promising path by combining ontological frameworks with deep learning, thereby advancing high-throughput computational annotations' accuracy and efficiency.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.