COMPUTER  SIMULATIONS  OF  REAL  MOLECULAR  SYSTEMS

 

Jindřich Hašek

 

Institute of Macromolecular Chemistry, Academy of Sciences of the Czech Republic, Heyrovského nám. 2, CZ-162 06 Praha 6, Czech Republic

 

INTRODUCTION

Quality of any computer study of real macromolecular systems heavily depends on availability of experimental information about the structure. Vast majority of the protein structures deposited in the "Protein structure database" (PDB) /1/ originates from X-ray structure analysis. It is gladdening that these data are widely used by many other experimental and theoretical methods, however one can find a lot of striking examples of incorrect use and misinterpretations even in the well established journals. This paper shows some basic facts that must be observed when using the protein structures deposited in the PDB.  

Most of programs are able to use atomic coordinates from PDB files automatically, skipping all other information about physical and chemical background. Other programs easily add missing hydrogens and fill in the atomic types and thus everybody can virtually grow into expert and draw nicely looking pictures of molecules in few days. Thus, it happens from time to time that we are surprised when we read about a "successful" interpretation of IR spectra resulting from the "fact" extracted from PDB that the liganded HIV protease is dimer and the unliganded monomer - the authors did not noticed that the ordinary PDB file shows the symmetrically independent part of molecule only and that the related monomer of unliganded protein (of course missing in the original *.pdb file) forms with the second part a compact unit.

To avoid many other similar "obstacles", one should observe and work with other parameters hidden in the data deposited in the database. Here is a short overview of some important actions without which the user of PDB can potentially easily destroy an afford spend in his future work.

 

MOLECULAR  MODEL  FOR  ENERGY  CALCULATION

Diffraction experiment provides as its primary result the map of electron density showing a time averaged view of all conformations and all intermolecular interactions with solvent and possibly with other molecules contained in the solvent drop surrounding the protein crystal at the time of measurement. The map shows all atoms which have some defined positions in the structure. Thus, we can see solvent molecules bound to protein in the first hydration shell, but the freely movable bulk water form only the uniform background without any possibility to talk about water positions. However, to start with energy calculations, one needs a single conformation of some low energy state of the molecular system.

 

Choice of the representative structure(s)

 

PDB often contains more structure determinations of the same molecule with different ligands, in different molecular environments or in different pH and buffers. Choice of the system which is near the intended subject of study is surely worth of some afford.

 

Simultaneously, one requires a clear and unambiguous view of structure. Thus, one of the important issues is accuracy and reliability of the selected structure determination.  One should evaluate the following:

·       Check the overall quality of the structure determination, i.e. the resolution, R and Rfree factors. These figures have only informative character. Do not believe much structures with  R >0.25 and Rfree > R+0.1. Notice specially, that the resolution does not mean accuracy. For example Res=4 Å means "unreliable",Res= 3 Å means "doubtful", Res=2 Å reliable with accuracy 0.1 Å, Res=1 Å means  accuracy 0.02 Å and you can start to interpret position of hydrogen atoms.

·       Check for chemical composition of buffer, pH, etc. for all candidates selected from PDB.

·       Check always the alignment of your sequence with both SEQREC and ATOM records.

·       Check for disorder or any missing parts of molecule. In critical cases decide about the chain truncation or modelling of the fragments missing in experimental PDB file.

·       Check the quality of the structure determination for each residuum in selected structures.

 

 

Temperature of measurement

In spite of the fact that many diffraction experiments are done under very low temperatures derived from liquid nitrogen, the reported structure always corresponds to the solvated protein structure under ambient temperatures. It is because the sample should be flash frozen so quick that solvent water remains amorphous in all cases, otherwise no diffraction from protein could be measured. Thus, the protein structure measured under low temperatures always corresponds to the fold  at the ambient temperature, i.e. to the situation just before the flash freezing.

One of many advantages of flash freezing of the protein sample is that the local vibrations of atoms are greatly restricted, some side chains with multiple conformations are fixed in lower energy states and thus one receives much sharper view of the low energy states of the whole molecular system. This is the reason, why the use of "low temperature" data is generally preferred. 

 

Choice of a single conformation from many

Conformational flexibility of macromolecule is in diffraction experiment represented by two factors. One is atomic displacement factor describing the motion of atom by ellipsoidal probability density of finding the atom. When the motion of atom is so large that the ellipsoidal description cannot describe the motion well, then the concept of multiple conformation is used. The disordered residuum or other part of molecule is build into the electron density several times with restriction that the sum of  occupation factors is one for each atom. Thus, contrary to NMR results were one is offered by a series of representative models, the crystallographers leave this step on the user of their results. At first site it may sound horrible that let say eighteen residues with two possible conformations theoretically lead to 262 144 structure models. However, it is relatively very easy to go through individual conformations and to decide which is the best for your purpose. 

·       Check all residues with multiple conformations in the main chain or side chains.

·       Choose the suitable conformational model

·       Check the interesting areas of the chosen structure at the map of electron density.

 

Idealization of the model

X-ray crystallography provides the structure of macromolecule in its natural environment with all interactions with solvent and other neighbour molecules. In calculations, one always works with some idealized system, so that the problem solved has prevailing affect on calculated energies and the changes in the distant parts of the system have possibly a negligible effect on the calculation.

·       Identify the areas at the surface where the intermolecular contacts take place and analyse the types of interactions. Decide how it suits to the purpose of your work. Decide whether to incorporate it to your model, to truncate it, to fix positions, or to minimize the relevant residues.   

 

Water molecules

Solvent is essential for protein conformation and therefore it needs a special care. Protein crystal is a regular arrangement of loosely connected protein molecules with solvent filling the gaps between them. Solvent form usually about 50 % of the crystal mass. The diffraction methods can identify well localized positions for water (ions or other solvent molecules), usually at the surface or inside of the protein. In the well made structure analysis it represents usually up to 10 % of solvent molecules. Thus, the experiment gives only the most important water positions. The bulk solvent (the solvent molecules which have no fixed position to macromolecule) is represented by continuum.  However, crystallographers usually do not concentrate on problem of hydration and some of them simple give no information about waters in their resulting PDB files. Therefore, to work with reliable hydration, one have to run programs for searching the empty cavities and crevices inside of protein and to fill them by reasonably oriented water molecules.

Keep in mind that the "crystallographic waters" means the average well localised sites for water molecules with residence times running from tenths ps at the protein surface to tenths ns for water positions deep inside of protein. Thus, it depends on purpose whether to tether the crystallographic waters at their average positions or to leave them free in dynamics.

 

Hydrogens

Neutron diffraction which provides reliable determination of hydrogen atoms is not widely used method. The X-ray crystallography has problems to localize hydrogen atoms in proteins and therefore their positions are not given in the PDB file as a rule. If you find hydrogens in the PDB file, ask some crystallographer to assess the meaning and reliability. Procedure is.

Add hydrogens automatically, however, the overall assessment of all possible variants namely inside of protein is always necessary. Keep in mind that the ionization inside of protein need not correspond to the solution pH and that all that is a dynamic process.

 

Equilibration of the starting structure model

The structure deposited in the PDB file represents the time averaged structure model and the atomic positions are subject of some experimental errors. Therefore, the preliminary equilibration of the structure model using the energy minimization of your choice is always necessary. The procedure is simple:

·       Assign parameters of all atom types (atom types). If it is done by program automatically, check all (atom by atom) carefully.

·       Tether all non-hydrogen atoms by soft forces and minimize hydrogen positions freely.

·       Optimize the starting model by your favorite minimization method.

 

Final check of the starting model

The primary result of diffraction experiment is a map of electron density. Therefore, the final structure model should correspond well the map "pipes". Of course deviations are expected at all places of truncation, rebuilding or any other intervention during the model system formation.

·       Check how the resulting structure model fits the experimental map of average electron density.

·       The difference should be interpreted as caused by interventions made before during the model building

·       The results may differ slightly according to the method of energy calculation and the tethering potential.

 

MOLECULAR DYNAMICS

High complexity of molecular simulation methods and applications does not allow general comments. Thus we can show only example of some misinterpretations following from static perception of the X-ray structure model and wrong representation of hydration.

Hydration is of basic importance for any protein structure and thus any afford consisting in unconstrained molecular dynamics without well represented water around the protein might be approved only in the case that one wants to see how easy is to dry and destroy his protein.

Some other problems of molecular dynamics found in literature follow from the fact that the macromolecules pass through their conformational space to slow (microseconds) so that in majority cases one cannot receive a plausible statistical sample by calculations of reasonable length. Here is a place for exploitation of logical extrapolations of simplified case studies of molecular dynamics, showing for example the ways how water molecules circulate at the protein surface, how water molecules penetrate in between the protein loops, how fast and in what form the ions penetrate inside of protein, what is average time necessary to deliver water molecule in the specific place inside of protein important for explanation of delay in spreading the solution pH into the protein bulk in the course of enzymatic reactions, etc.  

Another area for extrapolation and case studies is in an evaluation of conformation changes and motions of large molecular fragments. Here the molecular dynamics is successfully combined with suitably chosen additional external potentials forcing the molecular fragments to pass through the required paths.

 

CONCLUSION

Finally it is important to stress that the basic experimental result of diffraction experiment is the electron density in the average unit cell averaged over the time of measurement. This map contains full information about the conformation variability of the protein in the drop of solvent under temperature before flash freezing including the protein interactions with water and other components of the buffer used for experiment, and it reflects also the intermolecular interactions with neighbour molecules. The unbiased use of experiment requires more than copying the atom coordinates from PDB file. The presented coordinates are only some interpretation of the map by the scientist who made the final refinement of the structure model into the  electron density.

 

Acknowledgement

The study was supported by GA AV CR KJB4050312 and MSMT 1K05008.

 

APPENDIX  A.

 

Some servers that can help in assessment of molecular models.

 

Search through the PDB:

Protein Structure database searching tools:    http://www.ebi.ac.uk/msd-srv/msdlite/apps/query

 

Protein Structure database searching tools:    http://www.rcsb.org/pdb/

 

 

Design of ligands:

ProDrug server:     http://davapc1.bioch.dundee.ac.uk/cgi-bin/prodrg_beta

 

 

Search for experimentally observed interactions of various functional groups in the database of organic and organometalic structures (CCDC):

Main page of the Czech regional center of CCDC is    http://www-xray.fzu.cz/csd/csd.html

 

For registration to use the database of organic and organometalic structures follow the instructions at http://www-xray.fzu.cz/csd/regform.html

 

 

Ligand binding sites:

http://sumo-pbil.ibcp.fr/cgi-bin/sumo-database

 

http://relibase.ebi.ac.uk/reli-cgi/rll?/reli-cgi/general_layout.pl+home

 

 

 

REFERENCES

[1] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, Nucleic Acids Research, 2000, 28, 235-23.