Molecular enhanced sampling with autoencoders: On-the-fly collective variable discovery and accelerated free energy landscape exploration (1801.00203v2)

Published 30 Dec 2017 in physics.bio-ph, physics.comp-ph, and stat.ML

Abstract: Macromolecular and biomolecular folding landscapes typically contain high free energy barriers that impede efficient sampling of configurational space by standard molecular dynamics simulation. Biased sampling can artificially drive the simulation along pre-specified collective variables (CVs), but success depends critically on the availability of good CVs associated with the important collective dynamical motions. Nonlinear machine learning techniques can identify such CVs but typically do not furnish an explicit relationship with the atomic coordinates necessary to perform biased sampling. In this work, we employ auto-associative artificial neural networks ("autoencoders") to learn nonlinear CVs that are explicit and differentiable functions of the atomic coordinates. Our approach offers substantial speedups in exploration of configurational space, and is distinguished from exiting approaches by its capacity to simultaneously discover and directly accelerate along data-driven CVs. We demonstrate the approach in simulations of alanine dipeptide and Trp-cage, and have developed an open-source and freely-available implementation within OpenMM.

Citations (183)

View on Semantic Scholar

Summary

The paper employs autoencoders to automatically learn nonlinear collective variables, eliminating the need for expert-defined inputs.
The approach integrates these data-driven CVs into biased sampling methods within OpenMM, significantly accelerating free energy exploration.
The framework is validated on alanine dipeptide and Trp-cage systems, demonstrating enhanced sampling efficiency and accurate capture of key molecular motions.

Overview of Molecular Enhanced Sampling with Autoencoders

The research paper explores a novel method of accelerated sampling in molecular dynamics simulations by utilizing machine learning techniques, particularly autoencoders, to identify effective collective variables (CVs). These CVs are used to enhance the sampling process and improve the exploration of free energy landscapes in macromolecular systems. The central challenge addressed is surmounting the high free energy barriers that impede traditional molecular dynamics (MD) approaches.

The paper begins by acknowledging the limitations of standard MD simulations in effectively exploring conformational space due to these energy barriers. Traditional biased sampling methods have been used to tackle this, but they rely heavily on pre-defined CVs, which require expert intuition and can fall short for complex systems. The introduction of machine learning, particularly nonlinear dimensionality reduction techniques like autoencoders, provides a systematic and data-driven approach to discovering CVs without the prerequisite of deep system knowledge.

Key Contributions

Nonlinear CV Discovery Using Autoencoders:
- Autoencoders are employed to automatically learn nonlinear collective variables from MD trajectories. These neural networks are trained to reduce dimensionality while maintaining the key features of the data that influence the slow dynamics of the system.
Enhanced Sampling Through Direct Exploration:
- The explicit and differentiable nature of the CVs learned by autoencoders allows the researchers to directly utilize these variables in biased sampling methods, such as umbrella sampling. This step significantly accelerates the exploration of configuration spaces by focusing on the automatically identified CVs instead of relying on manually chosen proxies.
Framework Implementation:
- The authors integrate the approach into OpenMM, a molecular simulation package, and provide it as open-source, enhancing accessibility for further research and application.
Application and Validation:
- The methodology is applied to alanine dipeptide and Trp-cage systems, showcasing the ability of the learned CVs to facilitate exploration and identification of free energy surfaces that are in agreement with known results. The framework exhibits an acceleration in sampling compared to conventional unbiased simulations.

Results and Discussion

Through iterative interleaving of CV discovery and enhanced sampling, the authors demonstrate the potential of autoencoders to identify CVs that truly represent the intrinsic motions of the system. The results highlight that the 2D intrinsic manifold of alanine dipeptide was faithfully captured, correlating well with the established $\phi$ and $\psi$ backbone dihedrals. In the case of Trp-cage, the discovered CVs aligned with known collective behaviors, providing insights into the folding pathways.

Implications and Future Directions

MESA presents a novel and effective framework for bridging the gap between the computational limitations of MD simulations and the necessity for thorough exploration of configuration spaces in complex systems. The implications of this work are twofold:

Practical Sampling Efficiency:
- By focusing computational resources on learning and sampling along data-driven CVs, MESA could potentially be adapted to larger and more complex biomolecular systems where traditional methods struggle.
Theoretical Understanding:
- Understanding the collective motions governing molecular dynamics can lead to deeper insights into molecular behavior, potentially influencing fields beyond computational chemistry, such as drug design and materials science.

Looking forward, future explorations could enhance MESA by leveraging more advanced deep learning models to capture even more intricate molecular behaviors, exploring different systems and folding mechanisms, and examining broader categories of molecular interactions. Additionally, integrating methods like metadynamics could further streamline the sampling process by reducing the preparatory steps needed for boundary detection and sampling initialization.

In conclusion, the paper presents a substantial advancement in molecular dynamics through the innovative integration of machine learning techniques, marking a step towards more autonomous and efficient conformational exploration in molecular simulations.