RNA structure prediction by knowledge-based
statistical potentials and fragment assembly
David Dufour
and Marc A. Marti-Renom
National Center for Genomic Analysis
(CNAG) and Center for Genomic Regulation (CRG), Barcelona, Spain
New RNA structure prediction tools are needed to fast obtaining detailed structural information of new non-coding RNA sequences. Here we propose to use knowledge-based statistical potentials and a fragment-based modeling approach as input to predict RNA structure from sequence. We have downloaded a dataset composed of all x-ray determined RNA 3D structures from the Protein Data Bank (PDB). From the initial 1940 files, 3082 different RNA structures were selected after filtering small sequences (<20 nucleotides) and structures without base-pairing. The CD-HIT program was used on those sequences in order to derive the sequence families, which were 304 after filtering. With this dataset we have calculated several general statistics as percentage of canonical and non-canonical base pairs, stacking, sequence length and RMSD resolution. The SARA method as was used for generating an all-against-all alignment of the best representative of each RNA family from our dataset. Those alignments are being used as an input into the RNADOM program in order to derive a set of conserved RNA fragments. At the same time a set of structural properties was determined in order to describe the RNA structure, so the most informative ones will be checked for conservation against the fragment dataset, obtaining a series of knowledge-based statistical potentials from this dataset.