Improved Protein-ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference (2005.07704v1)

Published 17 May 2020 in q-bio.BM and cs.LG

Abstract: Predicting accurate protein-ligand binding affinity is important in drug discovery but remains a challenge even with computationally expensive biophysics-based energy scoring methods and state-of-the-art deep learning approaches. Despite the recent advances in the deep convolutional and graph neural network based approaches, the model performance depends on the input data representation and suffers from distinct limitations. It is natural to combine complementary features and their inference from the individual models for better predictions. We present fusion models to benefit from different feature representations of two neural network models to improve the binding affinity prediction. We demonstrate effectiveness of the proposed approach by performing experiments with the PDBBind 2016 dataset and its docking pose complexes. The results show that the proposed approach improves the overall prediction compared to the individual neural network models with greater computational efficiency than related biophysics based energy scoring functions. We also discuss the benefit of the proposed fusion inference with several example complexes. The software is made available as open source at https://github.com/llnl/fast.

Citations (178)

View on Semantic Scholar

Summary

The paper presents a novel fusion model that merges 3D-CNN and SG-CNN features to improve protein-ligand binding affinity predictions.
The methodology leverages the PDBBind 2016 dataset, outperforming traditional energy scoring methods with better Pearson correlation and RMSE.
The study signifies a breakthrough in drug discovery by reducing computational burden and increasing predictive reliability in early candidate screening.

Summary of Improved Protein-ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference

This paper addresses the persistent challenge of predicting protein-ligand binding affinity, a cornerstone issue in drug discovery, by employing advanced deep learning techniques. Traditional methods, such as biophysics-based energy scoring functions, are computationally intensive and often fall short in predictive accuracy. Two prevailing models within deep learning—3D Convolutional Neural Networks (3D-CNN) and Spatial Graph Convolutional Neural Networks (SG-CNN)—have shown promise due to their intrinsic handling of spatial data. The novelty of this research lies in its implementation of a fusion model that effectively combines the feature representations from these two approaches.

The paper leverages the PDBBind 2016 dataset to evaluate the efficacy of these neural network models. The 3D-CNN model quantifies interactions within atomic-level 3D voxel grids, while the SG-CNN utilizes an atom-based graph representation, capturing both covalent and non-covalent interactions based on Euclidean distance thresholds. This dual approach seeks to exploit complementary strengths—one learning from volumetric input and the other from relational data, respectively—to enhance binding affinity predictions.

Experimentation reveals that the fusion model outperforms individual CNN models across various metrics, including Pearson correlation and Root Mean Square Error (RMSE), demonstrating computational efficiency that does not compromise accuracy. Notably, the fusion model’s prediction reliability surpasses traditional methods like MM-GBSA scoring, illustrating its potential power in predictive tasks and computational efficiency.

Implications and Future Directions

The proposed fusion model delivers strategic advantages by consolidating different structural insights and reduces computational burdens typically associated with conventional methods. This could significantly expedite the drug discovery pipeline, enhancing the ability to identify viable candidate molecules early in development.

Looking ahead, the implication for AI in molecular biology and pharmacology could be immense. Developing further models that integrate diverse data representations might herald new standards in predictive accuracy and efficiency. The potential expansion of existing models, possibly through integration with molecular dynamics simulations, holds promise for tackling evidently complex biological interactions.

Moreover, the paper opens avenues for replicability and scalability within drug discovery research, especially as data volumes continue to expand. Greater emphasis on refining feature representation and synergistically combining disparate methodologies could unlock new horizons in computational drug discovery.

Ultimately, the research encourages cross-disciplinary innovation, pulling from computer science, molecular physics, and chemical informatics to enhance predictive models. As the field advances, it will be crucial to integrate machine learning paradigms with emerging biophysical approaches to extend the scope and reliability of protein-ligand interactions predictions.