Systematic
analysis of crystal and molecular structures
J. Hašek, J. Dohnálek
Institute of Macromolecular Chemistry AV ČR, Heyrovského nám.2, 16206 Praha 6,
hasek@imc.cas.cz
Currently available sources of experimentally determined molecular structures for analysis of relations between structure and function are:
Cambridge Structure Database of Organic and Organometalic Compounds (CSD) /1/ is distributed in our country by the Crystallographic Association. It contains presently 703 297 organic structures, 339 297 containing at least one metal atom and 221 037 containing at least one halogen atom.
The database is searched by self-explanatory program ConQuest . The best tool for visual inspection of structures in crystalline phase is MERCURY 3.0 /2/. Some other aspects of using the CSD are reviewed in /3/. Teaching modules are at http://www.ccdc.cam.ac.uk/support/documentation/csd/teaching_egs/toc.html. Advanced case studies are at http://www.ccdc.cam.ac.uk/case_studies/small_mol/ for inorganic structures and at http://www.ccdc.cam.ac.uk/case_studies/life_science/ for life sciences. The teaching version is free of charge. The full version is available for a reasonable fee paid yearly.
Protein Databank (PDB)”
/4/ contains 81 553 macromolecular structures mostly of biological origin.
71 508 of them were determined by X-ray diffraction, 9 354 by
solution NMR, 51 by solid state NMR, 38 by neutron diffraction, 32 by electron
diffraction, and 428 by electron microscopy.
75 525 of
these structures are proteins, 1 340 are DNA fragments, and 907 RNA. 41 052
structures originate from eukaryota, 29 945 from bacteria, 4 826 from
viruses, 3 215 from archea, and 3 743 from other sources. As far as
eukaryota, 19 952 structures of biological macromolecules originate from Homo Sapiens, 5 854
from E-coli, 3 513 from Mus Musculus, 2 148 from Saccharomyces Cerevisiae, 2 047
from Bos Taurus, etc. The most
frequently present enzyme types are hydrolases 16 952, transferases
12 493, oxydoreductases 7 480, lyases 3 340, isomerases
1 900 and ligases 1 696.
The PDB site
refers to many useful programs for structure analysis, e.g. for inspection of
adsorption of ligands on protein surface is the Protein Explorer /5/. The use
of database is free.
The database of human disease-related VIS (http://www.scbit.org/dbmi/drvis) collects and
maintains human disease-related VIS data, including characteristics of the
malignant diseases, chromosome region, genomic position and viral–host junction
sequence. The current database covers about 600 natural VIS of 5 oncogenic viruses representing 11 diseases. Among them,
about 200 VIS have viral–host junction sequence.
Polymer Structure Database
(PolyBase) /6/ contains structures of synthetic polymers determined
experimentally in solid state and several thousands of “snapshots” of the
PEG-like polymers built in the environment of biological macromolecules. Non-commercial
users in the Czech and Slovak Republics can get access to these services.
The Nucleic Acids Data Bank (http://ndbserver.rutgers.edu/)
contains 5 897 structures of oligonucleotides. The
link
http://ndbserver.rutgers.edu/education/index.html refers to several useful links to teaching
and educational resources. All events related to the Nucleic Acids Data Bank
are archived in the NDB Newsletter http://ndbserver.rutgers.edu/NDB_news/index.html.
The use of the database is free.
The Inorganic Crystal Structure Database (http://www.fiz-karlsruhe.de/icsd_content.html)
currently contains 150 042 inorganic structures. 1 616 crystal structures of elements,
28 354 structures of binary compounds, 55 436 ternary compounds and
54 144 quarternary and quintenary compounds. The database is updated twice
a year, adding each time about 3 500 new records. The database should be
purchased directly from http://www.fiz-karlsruhe.de.
The database CRYSTMET®
- Metals and Alloys
(http://www.tothcanada.com/) contains 139 058 entries with critically
evaluated crystallographic data, atomic coordinates and calculated powder
diffraction patterns for metals, alloys, intermetallics and minerals. The
database is closely related to a software package for materials modelling “Materials
Toolkit 2.7.1”
published in /7/.
The Crystallography Open Database – COD (http://www.crystallography.net)
collects all known „small molecule” and “small-medium sized unit cell“ crystal structures and makes them available freely on the internet.
It contains more than 150 000 structures /8/.
Simple use of
databases makes usually no problem. An advanced usage may be more sophisticated
and requires some experience. In case of interest in the advanced courses and
discussion meetings organized in the next three years, please fill the attached
tentative form. It ensures that you will receive the final information about
the program and exact date of the individual courses.
References
1. F. H. Allen, Acta Cryst., B58, (2002), 380-388. (www.ccdc.cam.ac.uk)
2. C. F. Macrae et al, J. Appl. Cryst., 41, (2008), 466-470.
3. J. Hašek, Chem. Listy 105, (2011), 467-475.
4. H.M. Berman et al, Nucleic Acids Res., 28, (2000), 235-242. (//ftp.wwpdb.org/pub/pdb/doc/newsletters/rcsb_pdb)
5.
http://www.pdb.org/pdb/staticHelp.do?p=help/viewers/ligandExplorer_viewer.html
6. J. Hašek et al, Zeitschrift fur Kristallogr. 28, (2011), 475-480.
7. Y.L. Page, J.R. Rodgers, J. Appl. Crystallogr. 38, (2005), 697-705.
8. S. Grazulis et al, Nucleic Acids Research, 40, (2012), 420-427.
Acknowledgements. The research is supported by GA ČR 310/09/1407.
Keywords: structure databases, structure-function
relations, organic and inorganic materials, polymers, proteins, RNA, DNA, intermolecular
interactions