Systematic analysis of crystal and molecular structures

J. Hašek, J. Dohnálek

 

Institute of Macromolecular Chemistry AV ČR, Heyrovského nám.2, 16206 Praha 6,

hasek@imc.cas.cz

 

Currently available sources of experimentally determined molecular structures for analysis of relations between structure and function are:

Cambridge Structure Database of Organic and Organometalic Compounds (CSD) /1/ is distributed in our country by the Crystallographic Association. It contains presently 703 297 organic structures, 339 297 containing at least one metal atom and 221 037 containing at least one halogen atom.

The database is searched by self-explanatory program ConQuest . The best tool for visual inspection of structures in crystalline phase is MERCURY 3.0 /2/. Some other aspects of using the CSD are reviewed in /3/. Teaching modules are at http://www.ccdc.cam.ac.uk/support/documentation/csd/teaching_egs/toc.html. Advanced case studies are at http://www.ccdc.cam.ac.uk/case_studies/small_mol/ for inorganic structures and at http://www.ccdc.cam.ac.uk/case_studies/life_science/  for life sciences. The teaching version is free of charge. The full version is available for a reasonable fee paid yearly.

Protein Databank (PDB)” /4/ contains 81 553 macromolecular structures mostly of biological origin. 71 508 of them were determined by X-ray diffraction, 9 354 by solution NMR, 51 by solid state NMR, 38 by neutron diffraction, 32 by electron diffraction, and 428 by electron microscopy.

75 525 of these structures are proteins, 1 340 are DNA fragments, and 907 RNA. 41 052 structures originate from eukaryota, 29 945 from bacteria, 4 826 from viruses, 3 215 from archea, and 3 743 from other sources. As far as eukaryota, 19 952 structures of biological macromolecules originate from Homo Sapiens, 5 854 from E-coli, 3 513 from Mus Musculus, 2 148 from Saccharomyces Cerevisiae, 2 047 from Bos Taurus, etc. The most frequently present enzyme types are hydrolases 16 952, transferases 12 493, oxydoreductases 7 480, lyases 3 340, isomerases 1 900 and ligases 1 696.

The PDB site refers to many useful programs for structure analysis, e.g. for inspection of adsorption of ligands on protein surface is the Protein Explorer /5/. The use of database is free.

The database of human disease-related VIS (http://www.scbit.org/dbmi/drvis) collects and maintains human disease-related VIS data, including characteristics of the malignant diseases, chromosome region, genomic position and viral–host junction sequence. The current database covers about 600 natural VIS of 5 oncogenic viruses representing 11 diseases. Among them, about 200 VIS have viral–host junction sequence.

Polymer Structure Database (PolyBase) /6/ contains structures of synthetic polymers determined experimentally in solid state and several thousands of “snapshots” of the PEG-like polymers built in the environment of biological macromolecules. Non-commercial users in the Czech and Slovak Republics can get access to these services.

The Nucleic Acids Data Bank (http://ndbserver.rutgers.edu/) contains 5 897 structures of oligonucleotides. The link  http://ndbserver.rutgers.edu/education/index.html  refers to several useful links to teaching and educational resources. All events related to the Nucleic Acids Data Bank are archived in the NDB Newsletter http://ndbserver.rutgers.edu/NDB_news/index.html. The use of the database is free.

The Inorganic Crystal Structure Database (http://www.fiz-karlsruhe.de/icsd_content.html) currently contains 150 042 inorganic structures.  1 616 crystal structures of elements, 28 354 structures of binary compounds, 55 436 ternary compounds and 54 144 quarternary and quintenary compounds. The database is updated twice a year, adding each time about 3 500 new records. The database should be purchased directly from http://www.fiz-karlsruhe.de.

The database CRYSTMET®  - Metals and Alloys  (http://www.tothcanada.com/) contains 139 058 entries with critically evaluated crystallographic data, atomic coordinates and calculated powder diffraction patterns for metals, alloys, intermetallics and minerals. The database is closely related to a software package for materials modelling “Materials Toolkit  2.7.1” published in /7/. 

The Crystallography Open Database – COD (http://www.crystallography.net) collects all known „small molecule” and “small-medium sized unit cell“ crystal structures and makes them available freely on the internet. It contains more than 150 000 structures /8/.

Simple use of databases makes usually no problem. An advanced usage may be more sophisticated and requires some experience. In case of interest in the advanced courses and discussion meetings organized in the next three years, please fill the attached tentative form. It ensures that you will receive the final information about the program and exact date of the individual courses.

References

1.       F. H. Allen, Acta Cryst., B58, (2002), 380-388. (www.ccdc.cam.ac.uk)

2.       C. F. Macrae et al, J. Appl. Cryst., 41, (2008), 466-470.

3.       J. Hašek, Chem. Listy 105, (2011), 467-475.

4.       H.M. Berman et al, Nucleic Acids Res., 28, (2000), 235-242. (//ftp.wwpdb.org/pub/pdb/doc/newsletters/rcsb_pdb)

5.       http://www.pdb.org/pdb/staticHelp.do?p=help/viewers/ligandExplorer_viewer.html

6.       J. Hašek et al, Zeitschrift fur Kristallogr. 28, (2011), 475-480.

7.       Y.L. Page, J.R. Rodgers, J. Appl. Crystallogr. 38, (2005), 697-705.

8.       S. Grazulis et al, Nucleic Acids Research, 40, (2012), 420-427.

 

Acknowledgements.   The research is supported by GA ČR 310/09/1407.

 

Keywords:  structure databases, structure-function relations, organic and inorganic materials, polymers, proteins, RNA, DNA, intermolecular interactions