Reliability and accuracy in protein structure determination

 

J. Hašek1, T. Skálová1, J. Dušková1, A. Štěpánková1, T. Koval2, J. Dohnálek1

 

1 Institute of Macromolecular Chemistry AV ČR, Heyrovského nám.2, 16206 Praha 6,

2 Institute of Physics AV ČR, Na Slovance 2, 18221 Praha 8

hasek@imc.cas.cz

 

Protein science made an immense progress in the last decade. Combination of information obtained by advanced methods of mass spectrometry, dynamic light scattering, small angle scattering, NMR techniques, molecular modeling and protein crystallography gave already a complete insight into the function of many biological systems on molecular level.

Namely protein crystallography made a huge progress. It has become a very common technique available practically for any laboratory and produces about 8 thousands of new bio-macromolecular structures per year. Hundreds of structures of the largest macromolecular complexes (as e.g. ribosomes [1]) and their components in different molecular environments have already been determined by protein crystallography and deposited into the PDB [6] in atomic resolution.

People also learned how to measure intermediate states of molecular complexes acquiring thus a complete insight into the function of “the molecular machines” responsible for correct function and regulation of processes in their native environment (e.g. pumping of Na/K ions by ATPase [2]).

Former difficulties with preparation of quality “protein crystals” were significantly reduced. Realizing the dynamic character of protein crystals, where the protein molecules remain floating in an equilibrium with solution, leads to development of methods for crystallization of macromolecules in various adhesion modes. This provides opportunities for direct observation of many different adhesion modes between molecules which are often decisive for signaling, intermolecular communication, transport, and formation of intermediate molecular complexes [3].

New X-ray Free Electron Lasers (XFEL) allow structure determination of large molecular complexes from nano-crystalline material promising thus structure determination of materials resistant to classical crystallization experiments, namely the structure determination of membrane proteins (e.g. structure determination of  photosystem I [4] ).

However, all standing experimental and theoretical methods have still some severe limitations which should be taken into account in attempts to understand the “molecular machines” responsible for correct function and regulation of processes in living environment. Failure to realize these limitations can result in molecular models with very problematic relation to reality.

There are some global measures of reliability and accuracy (as e.g. resolution, R factor, number and quality of restraints used in refinement, e.g. [5]). However the structure provided by protein crystallography is not static. The protein molecules are in dynamic equilibrium with solvent and it is usually reflected in different displacement factors (B-factors reflecting motion of individual atoms) and different stability of some parts of molecular complexes. It is necessary to learn “the crystallographic dialect” to get the real multi-conformational view of the real protein complex, to get a view of the motions inside of the protein complex and to read unbiased details of water interactions with protein surface and its motion in the protein channels and clefts.

There are also some human errors introduced into the PDB by inexperienced scientists. A relative easiness of the protein crystallography, accessibility to non-specialists and also so called “high throughput projects” mutually competing to beat the past records in a number of solved structures per year brought non-negligible amount of incompletely refined structures. The remediated version of the PDB [6], containing nowadays over 70 thousands experimentally determined bio-macromolecular structures, removed some global clashes only.

Thus a closer insight into reliability of the important structure segments is absolutely necessary before any serious structure study. When trying to use PDB deposited data one should always read comments in the first part of PDB file and observe some warning indicators along full lengths of protein chains (e.g. unusual B-factors, occupation factors, disorder indicators, missing atoms, etc.). All warning marks or some errors which can be found in some structure depositions in the PDB [6] should be analyzed.

It is important to realize that the real result of experiment is not the structure model deposited in the PDB, but that it is the tree-dimensional electron density into which the structure model is build by crystallographer. Therefore if in any doubts, one should check the three-dimensional map of electron density on the screen of his own personal computer. It is very easy to connect to the Electron Density Server (http://eds.bmc.uu.se/eds/) [7] and of course a little more difficult to interpret the indicated problem alone.

It is also important to read the provided information on the experiment planning, status of protein during measurement and also the procedures used during structure refinement. All that can be important to get a relevant picture of the real state of the protein complex in its cellular or extracellular milieu.

A neglected inspection of PDB file is the most frequent source of errors and misinterpretations in the comparative studies and the structure-function analyses.

Research is supported by IAA500500701 and 305/07/1073.

1.   M.A. Borovinskaya et al, Nat.Struct.Mol.Biol., 14, (2007), 727-732.

2.   M. Hilge et al, Nature Structure Biology, 10, (2003), 468-474.

3.   J. Hašek, J.Synchrotron Radiation, 18, (2011), 50-52.

4.   H.N. Chapman et al, (2010), Femtosecond X-ray protein Nano-crystallography, PDB deposition (http://www.pdb.org/pdb/explore/explore.do?structureId=3PCQ)

5.   J. Hašek, Materials Structure, 17, (2010), 24-26.

6.   H.M. Berman et al, Nucleic Acids Res., 28, (2000), 235-242. (http://www.rcsb.org/pdb/)

7.   G.J. Kleywegt et al, Acta Crystallogr., D60, (2004), 2240-2249. (http://eds.bmc.uu.se/eds/)