Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Clustering Molecular Energy Landscapes by Adaptive Network Embedding (2401.10972v1)

Published 19 Jan 2024 in q-bio.BM, cond-mat.stat-mech, and cs.LG

Abstract: In order to efficiently explore the chemical space of all possible small molecules, a common approach is to compress the dimension of the system to facilitate downstream machine learning tasks. Towards this end, we present a data driven approach for clustering potential energy landscapes of molecular structures by applying recently developed Network Embedding techniques, to obtain latent variables defined through the embedding function. To scale up the method, we also incorporate an entropy sensitive adaptive scheme for hierarchical sampling of the energy landscape, based on Metadynamics and Transition Path Theory. By taking into account the kinetic information implied by a system's energy landscape, we are able to interpret dynamical node-node relationships in reduced dimensions. We demonstrate the framework through Lennard-Jones (LJ) clusters and a human DNA sequence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. C. M. Dobson. Chemical space and biology. Nature, 432:824–828, 2004.
  2. J.-L. Reymond. The chemical space project. Accounts of Chemical Research, pages 722–730, 2015.
  3. David J. Wales. Exploring energy landscapes. Annual Review of Physical Chemistry, 69:401–25, 2018.
  4. DeepWalk: Online learning of social representations. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 3:701–710, 2014.
  5. Node2vec: Scalable feature learning for networks. KDD : proceedings. International Conference on Knowledge Discovery and Data Mining, pages 855–864, 2016.
  6. Network embedding techniques for metastable chemical kinetic systems. Mathematical Biosciences and Engineering, 18:868–887.
  7. Efficient network embedding based on sparse approximation of a random walk. Submitted, 2022.
  8. Escaping free-energy minima. Proc. Natl. Acad. Sci., 99:12562, 2002.
  9. Towards a theory of transition paths. J. Stat. Phys., 123:503–523, 2006.
  10. Transition states of stochastic chemical reaction networks. Comm. Comp. Phys., 29:606–627, 2021.
  11. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26:3111–3119, 2013.
  12. Clustering and embedding using commute times. IEEE Trans. Pattern Anal. Mach. Intell., 29:1873–1890, 2007.
  13. Diffusion wavelets. Applied and Computational Harmonic Analysis, 21:53–94, 2006.
  14. D. J. Wales. The cambridge energy landscape database.
  15. Multifunctional energy landscape for a DNA G-quadruplex: An evolved molecular switch. Journal of Chemical Physics, 147, 2017.
  16. S. Pasquali. HiRE-RNA: a high resolution coarse-grained energy model for RNA. The journal of physical chemistry. B, 114:11957–66, 2010.
  17. UCSF Chimera–a visualization system for exploratory research and analysis. Journal of Chemical Physics, 25(13):1605–12, 2004.
  18. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci, 4(2):268–276, 2018.
  19. Python energy landscape explorer, 2012.
  20. D. Wales. PATHSAMPLE: A driver for OPTIM to create stationary point databases using discrete path sampling and perform kinetic analysis.
  21. DisconnectionDPS.

Summary

We haven't generated a summary for this paper yet.