Database of Nucleic Acid Structures, NDB

 

Huanwang Yang1, Fabrice Jossinet2, Eric Westhof2, Neocleas Leontis3, Bohdan Schneider4, Chi-Ming Chao1, Zukang Feng1, Lisa Iype1, Xiang-Jun Lu1, Goran Aleksic1, Joanna de la Cruz1, Gregory Donahue1, Dipannita Kalyani1, Daniel Kulp1, Hari Narayan1, John Westbrook1, and Helen M. Berman1

1Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, Piscataway, NJ 08854, USA

2Institut de biologie moléculaire et cellulaire du CNRS, 15 Rue R. Descartes, 676084 Strasbourg, France 

3Chemistry Department, Center for Biomolecular Sciences, Bowling Green State University, Bowling Green OH 43403 USA

4Center for Complex Molecular Systems and Biomolecules, Dolejskova 3, CZ-18223 Prague, Czech Republic 

 

The Nucleic Acid Database (NDB) [1, 2] was established in 1991 as a resource of crystal structures containing nucleic acids. The core of the NDB has been its relational database of primary and derivative data with very rich query and reporting capabilities. This robust database was unique in that it allowed researchers to do comparative analyses of nucleic acid-containing structures selected from the NDB according to the many attributes stored in the database.

Content of the NDB

Structures available in the NDB include RNA and DNA oligonucleotides with two or more bases either alone or complexed with proteins or small molecule ligands. The archive stores both primary and derived information about the structures. The primary data include: crystallographic or NMR coordinate data, structure factors for the X-ray structures or contraint files for the NMR structures, and information about the experiments used to determine the structures, such as crystallization information, data collection, and refinement statistics. Derived information, such as valence geometry, torsion angles, and intermolecular contacts, are calculated and stored in the database. Database entries are further annotated to include information about the overall structural features, including conformational classes, special structural features, biological functions, and crystal-packing classifications.

Data processing and validation

Over the years, the NDB has developed a robust data-processing system for deposition, processing, archiving, querying, and distributing structural data. The full capability of this system was recently demonstrated by the successful processing of ribosomal subunits, which are very large and complex structures.

This data-processing system is built on top of the mmCIF dictionary and it naturally supports building of database tables. The NDB has early on adopted the mmCIF [3] as its data standard. This format has three advantages from the point of view of building a database: (1) comprehensive definitions for terms of molecular structure description, crystallography, NMR, as well as electron microscopy; (2) it is self-defining; and (3) the syntax clearly defines the relationships between data items. The latter feature is important because it allows for rigorous checking of internal consistency of the data.

Structures are deposited via the web with the AutoDep Input Tool (ADIT) [4] and then annotated using the same tool. In the next stage of data processing, a program called MAXIT (Macromolecular Exchange and Input Tool) [5] checks and corrects residue and atom naming, numbering, and ordering as well as the correspondence between the declared sequence and the sequence based on residue names in the coordinate file. Once these integrity checks are completed, the structures are validated by NUCheck [6], another program written as a part of the NDB project. NUCheck verifies valence geometry, torsion angles, intermolecular contacts, and the chiral centers of the (deoxy)riboses and phosphates.

The NDB query capabilities

The core of the NDB project is a relational database in which all data items are organized into tables. At present, there are over 90 tables in the NDB, with each table containing 5 to 20 data items. These tables contain both experimental and derived information. Example tables include: the citation table, the cell dimension table, and the refine parameters table.

Interaction with the database is a two-step process. In the first step, the user defines the selection criteria by combining different database items. Once the structures that meet the constraint criteria have been selected, reports may be written using a combination of table items. For any set of chosen structures, a large variety of reports may be created, e.g. a crystal data report, a backbone torsion angle report, or the user could write a report that lists the twist values for all CG steps in the selected structures together with statistics, including mean, median, and range of values. An important feature of the NDB capabilities is that the constraints used for the reports do not have to be the same as those used to select the structures.

The changing face of the ndb

Since the NDB project began in early 1990s, our knowledge of nucleic acid structures has grown in quantity as well as quality. Early structures of DNA and RNA oligonucleotides, a few protein-DNA complexes, and some tRNA structures available in 1990, have been extended by hundreds of protein – DNA complexes, ribozyme structures and the newest additions to the archive—ribosomal subunit structures. Not of the least importance is the growth of nucleic acid structures solved by NMR techniques.

All this had to be reflected by changes in the NDB itself. During the last three years, the NDB is undergoing a gradual change. The changes started by unifying data structure and adding new data items and corresponding database tables. Further, over 500 NMR structures have been added to the NDB. These changes offered to build new query tools and possibilities for more flexible and reliable searches. A new web interface was designed to make the query capabilities of the NDB as widely accessible as possible and easy to use. Figure 1 shows the new NDB home page, with possibility for ID and keyword search and key links.  

Implementation of these changes required a complete overhaul of the layout of the NDB web pages including new style for all graphic representation of nucleic acids (Lu & Olson, in preparation; [7]). Figure 2 illustrates some possibilities of the new searches.

 

Figure 1. NDB home page, http://ndbserver.rutgers.edu.

 

The largest changes have been made to the Atlas pages which have been completely overhauled. The classification of structures have been simplified, structures are presented in thumbnail galleries, an example for ribozyme structures is shown in Figure 3, and design of  the pages for individual structures have also been modified to give the user a more informative overview of a structure (Figure 4).

In the new Search mode, several items, including structure ID, author, and several classification and structural features can be limited either by entering text in a box or by selecting a Yes or No option. Any combination of these items may be used to constrain the structure selection. If none are used, the entire database will be selected. After selecting “Execute Selection” the user will be presented with a list of structure IDs and descriptors that match the desired conditions. Several viewing options for each structure in this list are possible. These include retrieving the coordinate files in either mmCIF or PDB format, retrieving the coordinates for the biological unit, or viewing an NDB Atlas page. Preformatted Quick Reports can then be generated for the structures in this result list. Multiple reports can be easily generated, e.g. to get bibliographic information, refinement statistics, and backbone torsion angles for the selected structures.

 

Figure 2. Some search options of the NDB.

Figure 3. NDB Atlas gallery for ribozyme crystal structures.

Figure 4. Atlas page of the structure , NDB code UR0003.

 

In the Full Search/Full Report mode, it is possible to access most of the tables in the NDB to build more complex queries. Instead of limiting items that are listed on a single page, the user builds a search by selecting the tables and then the items that contain the desired features. These queries have selectable Boolean and logical operators to make complex queries. After selecting structures using the Full Search, a variety of reports can be written. The report columns are selected from a variety of database tables, similar to the tables used for the Full Search.


Data distribution

Coordinate files, database reports, software programs, and other resources are available via the ftp server (ftp://ndbserver.rutgers.edu). In addition to links to information provided from the ftp server, the web server (http://ndbserver.rutgers.edu/) provides a variety of methods for querying the NDB. These sites are updated continually.

Acknowledgements

The NDB Project is funded by the National Science Foundation and the Department of Energy. BS is supported by a grant from the Ministry of Education of the Czech Republic No. LN00A032 for the Center for Complex Molecular Systems and Biomolecules.

References

[1] Berman H.M., Olson W.K., Beveridge D.L., Westbrook J., Gelbin A., Demeny T., Hsieh S.-H., Srinivasan A.R., Schneider B. (1992): The Nucleic Acid Database—a comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63:751–9. [This paper gives the full description of the NDB system.]

[2] Berman H.M., Feng Z., Schneider B.,Westbrook J., Zardecki C. (2001): The Nucleic Acid Database (NDB). In: Rossman M.G., Arnold E., editors. International Tables for Crystallography, F. Crystallography of Biological Macromolecules. Dordrecht: Kluwer Academic Publishers. pp. 657–662.

[3] Bourne P.E., Berman H.M., Watenpaugh K., Westbrook J.D., Fitzgerald P.M.D. (1997): The macromolecular Crystallographic Information File (mmCIF). Methods Enzymol. 277:571–90.

[4] Westbrook J., Feng Z., Berman H.M. (1998): ADIT—The AutoDep Input Tool. Department of Chemistry, Rutgers, the State University of New Jersey, RCSB-99.

[5] Feng Z., Hsieh S.-H., Gelbin A., Westbrook J. (1998a): MAXIT: Macromolecular Exchange and Input Tool. New Brunswick, NJ: Rutgers University, NDB–120.

[6] Feng Z., Westbrook J., Berman H.M. (1998b): NUCheck. New Brunswick, NJ: Rutgers University, NDB–407.

 [7] Yang, H., Jossinet, F., Leontis, N., Chen, L., Westbrook, J., Berman, H. M., and Westhof, E. (in press).