Structural mechanics of DNA: from atomic scale to bioinformatics

 

F. Lankaš, J. H. Maddocks

 

Institute for Mathematics B, Swiss federal institute of technology (EPFL)

Station 8, CH-1015 Lausanne, Switzerland

filip.lankas@epfl.ch

 

 

 

The storage and retrieval of genetic information and their regulation depend on interaction of DNA with numerous proteins. It is now well established that the protein-DNA interactions may operate through two different mechanisms: so-called direct readout where specific chemical contacts between protein and DNA are made, or indirect readout, in which it is the three-dimensional structure or the mechanical deformability of the molecules, in particular the DNA, that guides the binding.

We set us a goal to investigate sequence-dependent DNA structure and mechanical deformability at a wide range of length scales. The general approach we use consists in performing large-scale, atomistic molecular dynamics (MD) simulations of DNA in explicit solvent, and employing methods from statistical physics to parametrize coarse-grained models on different length scales from the MD data.

We consider a rigid basepair and a rigid base model, where base pairs or bases are treated as interacting rigid bodies in contact with a thermal reservoir. Assuming harmonic (quadratic) interaction energy, the shape parameters and stiffness constants can be inferred from structural fluctuations observed in an unconstrained simulation. Similar models have already been parametrized, using either an ensemble of crystal structures or atomistic MD trajectories, but all of them rely on the assumption that only nearest-neighbour base pairs or bases contribute to the interaction. We studied the full, nonlocal problem and found significant contribution to the interaction energy beyond nearest neighbours. The rigid base model is clearly more complete than the rigid basepair one, and we found that it is also much more physically realistic.

In order to obtain comprehensive sequence-dependent data, we take part in an initiative (the ABC consortium) aimed at performing atomistic MD simulation of a pool of oligomers involving all possible tetrameric sequences. The simulations are close to completion and the first analysis is under way. In our lab we concentrate on obtaining the full set of sequence-dependent shape and stiffness parameters for tetrameric sequences and on constructing comprehensive, nonlocal models based on them.

The knowledge of sequence-dependent parameters for a rigid base model enables one to predict the deformation energy (within harmonic approximation) for a deviation from the equilibrium geometry and thus help quantify the energetic cost of indirect readout by a protein. This opens the possibility to include the shape and mechanical stiffness of a sequence as additional parameters in the analysis of sequence similarity, routinely based only on the sequence viewed as a text, i.e. a sequence of letters. The possibility that very different sequences have similar mechanical properties and thus may be involved in similar biological functioning would be readily detected in this way.

The length scales from individual base pairs up to ca. a hundred of base pairs of DNA are crucial in protein-DNA interactions. The latter scale, for instance, is the one on which DNA is wrapped in the nucleosome, and on which loops of DNA are involved in the regulation of gene expression. Recent experiments on the formation of DNA minicircles suggest that DNA of ca. 100 base pairs in length can form loops with a rate much higher than predicted by standard theories. It has been proposed that some kind of transient local deviation from the double helical structure (bubbles, or kinks) may provide the explanation. We studied sharply bent DNA in a series of atomistic simulations of 94-bp DNA minicircles and found that kinks, but not bubbles, indeed arise during the simulation. The kinks involve a sharp bending into the minor groove, and their number depends on the supercoiled state of the molecule. The results suggest a microscopic basis for models of DNA looping beyond the harmonic approximation.