DEPOSITION OF MACROMOLECULAR STRUCTURES TO THE PROTEIN DATA BANK (PDB)

Bohdan Schneider

 

Center for Biomolecules and Complex Molecular Systems, Institute of Organic Chemistry and Biochemistry, AS CR, Fleming Sq. 2, CZ-16610 Prague, Czech Republic

 

Most grant agencies and virtually all journals require that the result of crystallographic or solution NMR analysis are deposited with a public database. In case of macromolecular structures, it is the Protein Data Bank ([1], PDB, http://www.pdb.org/) or the Nucleic acid Database ([2], NDB, http://ndbserver.rutgers.edu). Everyone involved in structure determination should keep in mind that structures that have been nurtured in laboratories for months and in some cases for years, will not be viewed in light of notebooks, log files from data processing and refinement, neither from endless coffee discussions in the laboratory but solely by their representation in the PDB. The deposition process therefore deserves attention and should be viewed as an important part of structure determination. The workshop will present the tools developed by the RCSB PDB that assist and simplify the deposition.

The main deposition tool is AdIt, deposition and validation tool, http://deposit.rcsb.org/. It is a web-based mmCIF editor. To deposit a structure, the user uploads the relevant coordinate and experimental data files and then adds any additional information. Each structure should be validated before deposition. Coordinates should be checked for format consistency and for quality of valence geometry using the Validation server (http://deposit.pdb.org/validate/). Web server http://pdb-extract.rcsb.org/auto-check/ allows non-trivial checking of coordinates versus x-ray diffraction data („structure factors“) using programs SFCheck, REFMAC, and CNS. Correctly formatted coordinates as well as collection and refinement statistics should be produced by the pdb_extract tool ([1], http://pdb-extract.rcsb.org/) that allows integration of refinement logs of most major refinement programs into PDB and/or mmCIF format and significantly thus simplifies the deposition. Identity of ligands present in the to-be-deposited structure should be verified using the ligand tool, currently at the web for „Ligand Depot“ (http://ligand-depot.rcsb.org/) that allows you to determine whether your ligands are correctly labeled, whether the right atom names were used, and whether these ligands are possibly new to the PDB.

All the mentioned web pages have available extensive tutorials, many steps have context-sensistive help and example pages and most of them are available as downloadable executable files as well as source codes.

The workshop will show deposition process using example files, possibly from participants.

 

Acknowledgements. The PDB project is funded by the National Science Foundation, the Department of Energy, the National Institute of General Medical Sciences, and the National Library of Medicine. BS kindly acknowledges support by a grant from the Ministry of Education of the Czech Republic No. LC512 for the Center for Biomolecules and Complex Molecular Systems.

 

[1] Berman H.M., Battistuz T., Bhat T.N., Bluhm W.F., Bourne P.E., Burkhardt K., Feng Z., Gilliland G.L., Iype L., Jain S., Fagan P., Marvin J., Padilla D., Ravichandran V., Schneider B., Thanki N., Weissig H., Westbrook J.D., Zardecki, C. (2002): The Protein Data Bank. Acta Crystallogr D, 58, 899-907.

[2] Berman H.M., Olson W.K., Beveridge D.L., Westbrook J., Gelbin A., Demeny T., Hsieh S.-H., Srinivasan A.R., Schneider B. (1992): The Nucleic Acid Database—a comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63,751–759.

[3] Yang, H., Guranovic, V., Dutta, S., Feng, Z., Berman, H.M., Westbrook, J.D. (2004): Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Cryst. D 60, 1833-1839.

[4] Feng, Z., Chen, L., Maddula, H., Akcan, O., Oughtred, R., Berman, H.M., Westbrook, J. (2004): Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics 20, 2153-2155.