tSNE (t-distributed Stochastic Neighbour Embedding) is a popular method used to analyse data from single-cell gene expression measurements, RNAseq, flow cytometry and other experiments providing high-dimensional data. It can be also used to analyse structures sampled by molecular dynamics simulations. We developed a variant of tSNE called time-lagged tSNE. Structures sampled by molecular dynamics simulations are first superimposed to a reference structure to remove translational and rotational motions. Next, they are analysed by a variant of independent component analysis. This analysis correlates coordinates of a molecular system with time-lagged coordinates. This emphasizes slow motions and suppresses fast motions. Finally, tSNE is applied on the output.
The result is a 2D map of conformation of a molecular system. For simulations of Trp-cage mini-protein folding and unfolding we obtained a plot with a central cluster corresponding to the unfolded structure. Folded structure as well as other long-lived structures were located as peripheral clusters surrounding the unfolded state. Unlike standard tSNE, this representation captures not only structural differences between states, but also kinetics.
We see a great potential of time-lagged tSNE in acceleration of molecular simulations. We used a method called metadynamics to drive conformational changes along the 2D map from time-lagged tSNE. For this purpose it was necessary to modify time-lagged tSNE to make it possible to calculate time-lagged tSNE coordinates on the fly and to convert forces acting on time-lagged tSNE coordinates into forces acting on individual atoms. We solved this problem by an application of an artificial neural network in parametric time-lagged tSNE.
We successfully applied this method on folding of the Trp-cage mini-protein.
The work was supported by Czech Science Foundation (22-29667S).