AMUSET-TICA: A Tensor-Based Approach for Identifying Slow Collective Variables in Biomolecular Dynamics.

Journal: Journal of chemical theory and computation
Published Date:

Abstract

Elucidating collective variables (CVs) for biomolecular dynamics is crucial for understanding numerous biological processes. By leveraging the tensor-train data structure, a multilinear version of the AMUSE (Algorithm for Multiple Unknown Signals) algorithm for Koopman approximation (AMUSEt) was recently developed to identify CVs for biomolecular dynamics. To find slow CVs, AMUSEt transforms input features (e.g., pairwise atomic distances) into nonlinear basis functions (e.g., Gaussian functions) and encodes these nonlinear basis functions within a tensor-train structure via time-lagged correlation functions. Due to the need to fit these tensor-train data structures into computer memory, AMUSEt can handle only a limited number of input features. Consequently, AMUSEt relies on manually selecting and ranking features based on physical intuition to fully capture the slow dynamics. However, when applied to complex biological systems with numerous features, this selection and ranking process becomes increasingly challenging. To address this challenge, here we present AMUSET-TICA (AMUSEt-based Time-lagged Independent Component Analysis), a CV-identification method using time-structure-independent components (tICs) as the input features for AMUSEt. The key insight of AMUSET-TICA lies in its highly effective embedding of high-dimensional atomistic protein conformations, achieved by expanding orthogonal tICs into overlapping Gaussian basis functions through a tensor-product data structure. This eliminates the need for manually selecting and ranking input features for a wide range of biomolecular systems. We demonstrate that AMUSET-TICA consistently and significantly outperforms AMUSEt and tICA in identifying slow CVs for three different biomolecular systems: alanine dipeptide, the N-terminal domain of L9 (NTL9), and the FIP35 WW domain. For all these systems, the CVs generated by AMUSET-TICA accurately describe the slowest dynamical modes underlying these biological conformational changes. Furthermore, we show that AMUSET-TICA achieves performance comparable to deep-learning approaches like VAMPnets in identifying the slowest dynamical modes, while being significantly more computationally efficient in terms of CPU time. In addition, the CVs yielded by AMUSET-TICA provide insights into the folding mechanisms of NTL9 and the FIP35 WW domain, including CV3 and CV4 of the WW domain, which capture its two parallel folding pathways. We expect AMUSET-TICA can be widely applied to facilitate the investigation of biomolecular dynamics.

Authors

  • Siqin Cao
    Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.
  • Feliks Nüske
    Max-Planck-Institute for Dynamics of Complex Technical Systems, Magdeburg 39106, Germany.
  • Bojun Liu
    Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.
  • Micheline B Soley
    Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.
  • Xuhui Huang
    Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 100190 Beijing, China; Research Center for Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, 100190 Beijing, China. Electronic address: xuhui.huang@ia.ac.cn.