Classes of RNA Conformations

Bohdan Schneider¹, Zdeněk Morávek², and Helen M. Berman³

¹Center for Complex Molecular Systems and Biomolecules and

Institute of Organic Chemistry and Biochemistry,

Academy of Sciences of the Czech Republic,

Flemingovo n.2, CZ-16602 Prague, Czech Republic, bohdan@rcsb.rutgers.edu

²Faculty of Mathematics and Physics, Charles University, Ke Karlovu 2, Prague, Czech Republic

³Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, NJ-08854, USA

The diversity and complexity of RNA structure is a consequence of the high flexibility of the polynucleotide backbone and is best exemplified by the crystal structures of the ribosomal units. A nucleotide has seven torsional degrees of freedom, including the torsion c around the glycosidic bond; this multidimensionality of the nucleotide conformational space represents a major obstacle to its systematic analysis. In the work presented here, the multidimensional RNA conformational space and the very large number of possible correlations among the individual torsion angles were simplified by focusing on the interrelationships of the conformation angles that define the phosphodiester linkage (torsion angles labeled z_i and a_i+1) and the other backbone torsion angles. A single near atomic resolution structure with over 2800 nucleotides from the 23S and 5S rRNA molecules of the large ribosome subunit (NDB code RR0033, PDB code 1JJ2, ref. 1) serves as a database for the analysis.

Detailed analysis of the RNA backbone torsions was performed in six three-dimensional projections of the torsional multidimensional space. Each 3D torsional map consists of a distribution of points (t1, t2, t3)_i that are Fourier-transformed (FT) into their pseudo electron densities, density maps visually inspected to localize peak positions and peak maxima fitted. Before the FT averaging, the original data matrix of 2841 points had to be modified. A majority (~70%) of all nucleotides of RR0033 are in the A-type conformation with torsion angle values at the phosphodiester link z_i ~290° and a_i+1 ~300°. These residues were excluded from the original data matrix because concentration of most points into a narrow area of the map would deform the pseudo electron densities in other regions. The remaining 830 points were Fourier-averaged and further analyzed.

In each of the six analyzed maps, about ten peak maxima were identified, their positions fitted and named. Distances between the peaks and 830 individual data points (t1, t2, t3)_i of each map allow labeling of data points by the name of the nearest peak. Data points labeled by peak names in the six maps were clustered by a technique called “lexicographical clustering” which starts by alphabetical sorting of the data point labels for the six maps in the same way as one would order words in a dictionary. To make sure lexicographical clusters represent conformational families, clustered dinucleotide fragments were compared by the standard least square overlap of dinucleotide atoms and outliers removed; rmsd values of the families were 0.2 – 0.7Å. Most dinucleotides in the families are conformationally so similar that all their torsions could be and were determined and their averaged Cartesian coordinates determined. The identified conformations will be characterized in the talk and the accompanying poster. Here we summarize a few most interesting findings.

Non-A-type conformations occur in most cases isolated between nucleotides in A-type conformations and rarely connect to one another. Especially several “open” conformations (numbers 8–17) occur in single stranded regions linking two or more double helices, Some other conformations with stacked bases (as #1, 3—6) can be a part of double helical regions with local disruption of the helix by a bulge or non-canonical base pair(s).

Sequence preferences were observed only in a few conformational families. Notably, the recognized preferences involve preference for purine rich regions in conformations #4—6 (preference for RR) and #2, which occurs preferentially in tetra loops with sequences RNRN. In contrast, the conformation with parallel orientation of the subsequent bases and zero rise known as “adenine platform” motif (#7) showed no sequence preference for AA.

Stacked or parallel bases, ‘normal’ rise.

Conformation #1: backbone conformation is, in fact, very close to that of the purine-pyrimidine (RY) steps of Z-DNA but in contrast to the Z-DNA, both bases have ‘normal’ anti orientation and the conformation shows no sequence preference for RY known from the Z-DNA. Conformation #2: an unusual combination of torsions z_i—a_i+1—b_i+1—g_i+1 reverses the direction of the backbone at the beginning of the second nucleotide so that the second ribose is flipped upside down from its A-type position, the second base is rotated anticlockwise from its A-type position by ~180°, and the bases do not stack. The conformation has a preference for short, mostly tetra-, loops with prevailing sequence RNRN. It is most often located at the stem—loop interface and one nucleotide of the motif forms a non-canonical pair, typically G•A, of a tetra loop.

Parallel bases and low-to-zero rise.

Conformations #5 – 7: bases have low rise, are in edge-to-edge orientation and can form non-canonical hydrogen bonds directly or via a water molecule. A significant feature of #5 – 6 is that their dinucleotides can occur at the opposite strands of double helices with non canonical base pairs and prefer purine rich regions, the motif itself has mostly RR sequence. They can also occur in single stranded links. The family #7 is very similar to the motif known as the “adenine platform” but it shows no sequence preference for AA.

“Open conformations”: not stacked bases, short-to-normal P_i—P_i+2 and large C1_i—C1_i+1 distances.

Conformations #8 – 9: the backbone forms a U-shaped turn in the RNA direction with short P_i—P_i+2 distances. The second base is rotated 180° away from its position in the A-type but lies in the same plane, conformation #9 has the first base in the minor syn orientation. Dinucleotides of families 8 – 9 participate mostly in single stranded links or bulges between double helical regions and form base pairs only rarely, never are involved in the canonical ones. Conformations #14 – 17: are extremely extended, the bases are rotated away from each other, the first base is ‘above’ the P_i and the second ‘below’ the P_i+2, and border the dinucleotide on both ends. Positions of the base and phosphate attached to the second ribose are swapped and the backbone of the dinucleotides has an S shape form. Similarly to other “open” conformations, #14 - 17 form hinges between a short single stranded link and a double helix.

Conformations #19 – 32: Nucleotides in the A-type conformations form about 70% of the studied ribosomal RNA and were further investigated. Their large majority, exactly 1513, have the whole dinucleotide in conformation of the canonical A-RNA. There are, however, about fifteen other well defined conformational families with small but pronounced deviations from the canonical A-RNA. These deviations are localized in one or two torsion angles of the first or second nucleotide.

The present work suggests that the multidimensionality of the RNA conformational space can be approached by analysis of conformations at the phosphodiester link O3’_i—P_i+1—O5’_i+1, defined by torsion angles z_i—a_i+1. We deduced the central role of torsions z_i—a_i+1 from the fact that they exhibit the highest variability yet are limited into well defined regions, noise notwithstanding. We suggest that character and importance of the z_i—a_i+1 scatter gram can be compared to the cornerstone of protein structural science, Ramachandran plot of the protein backbone torsion angles F and Y.

1. Ban, N., Nissen, P., Hansen, J., Moore, P.B. and Steitz, T.A. (2000) The complete atomic structure of the large ribosomal subunit at 2.4 Ǻ resolution. Science, 289, 905-920.