EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation (1707.06017v1)

Published 19 Jul 2017 in q-bio.QM, cs.CV, and stat.ML

Abstract: During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank has increased more than 15 fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence however is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D-convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The 2-layer architecture was investigated on a large dataset of 63,558 enzymes from the Protein Data Bank and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.

Citations (82)

View on Semantic Scholar

Summary

The paper introduces a novel 3D CNN workflow that leverages volumetric enzyme structure representations to bypass traditional sequence alignment methods.
It achieves 78.4% accuracy and a 74.6% macro F1 score using a two-layer CNN architecture, underscoring its effectiveness in enzyme classification.
The study highlights potential improvements, including multi-channel inputs and advanced augmentation strategies, to enhance enzyme structure analysis in bioinformatics.

EnzyNet: A 3D Convolutional Neural Network for Enzyme Classification Using Spatial Representation

The advancement of deep learning, particularly convolutional neural networks (CNNs), has substantially impacted computational biology, especially in the classification of enzymes—a critical task given their diverse biological roles. The paper "EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation" presents a method leveraging a novel approach using 3D CNNs for predicting the Enzyme Commission (EC) numbers based solely on voxels derived from 3D structural data of enzymes.

Methodology and Architecture

The approach introduced in this paper circumvents traditional reliance on amino acid sequence alignment and predefined feature extraction, thus offering a more adaptable framework. By converting 3D structures into a volumetric binary occupancy grid which serves as the input to a two-layer 3D CNN architecture, the model identifies spatial patterns within the structures. This network subsequently predicts EC numbers among six enzyme classes: Oxidoreductase, Transferase, Hydrolase, Lyase, Isomerase, and Ligase.

Key aspects of the architecture include:

Input representation: The enzyme structures are captured in a volumetric format with dimensions of 32×32×32 cubes. The spatial normalization of these volumes, using intrinsic coordinate systems via principal component analysis, helps standardize protein orientations.
Network layers: The CNN comprises two convolutional layers: the first with 32 filters of size 9×9×9, followed by a second layer with 64 filters of size 5×5×5. This is followed by a max pooling layer and fully connected layers, ending with a softmax classifier.
Activation and regularization: Leaky ReLU activations are utilized, with L2 regularization and dropout applied to counteract overfitting, adapting techniques from established networks like VoxNet.

Training and Performance

The researchers utilized a set of 63,558 enzymes from the RCSB Protein Data Bank, with data split into training, validation, and testing subsets. They investigated both uniform and class-specific weighted loss functions to address class imbalance, important given the diverse class sizes in the dataset.

The model achieved an accuracy of 78.4% and a macro F1 score of 74.6% without data augmentation, which included axial flips to generate transformations for robustness. Different formulations of decision rules (such as majority voting and weighted combinations) enhanced the classifier’s predictions.

Results and Implications

The model's performance demonstrated significant advantages in precision and recall for classes with sufficient representation in the training data. However, under-represented classes posed challenges, impacting precision in adapted models but improving recall substantially, as those models corrected for class imbalance.

The paper's approach suggests potential enhancements for enzyme classification methodologies, especially in integrating multi-channel representations that extend beyond mere structural concerns to incorporate biological properties such as hydropathy and electrostatic charge.

Future Directions

Prospective developments could encompass using more extensive grids to encapsulate finer structural details or adopting new optimization methods to handle multi-label classification of enzymes, an attribute often observed in biological systems. Furthermore, exploring generative adversarial approaches for representation learning, as seen in similar domains, could further refine this novel classification framework.

In conclusion, EnzyNet presents a competent framework illustrating the effective use of 3D CNNs in classifying biological structures, potentially influencing forthcoming methodologies in both computational biology and broader applications within structural bioinformatics.

PDF Markdown

Related Papers

GitHub

GitHub - shervinea/enzynet: EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation (200 stars)

Tweets

https://twitter.com/chupvl/status/887988442122027008

https://twitter.com/alxndrkalinin/status/888482584904294400