Hydration of amino acid residues in proteins: what can we learn from data mining?

 

Lada Biedermannová, Bohdan Schneider

 

Laboratory of Biomolecular Recognition, Institute of Biotechnology, Academy of Sciences of the Czech Republic, v. v. i., Prague, Czech Republic

 

 

In structural biology, water has been for a long time considered a passive medium. Nevertheless, its complexity and importance for biological interactions has gradually been recognized in recent years. It is now accepted that water is a key determinant of protein structure, dynamics and function, and that water-protein interaction governs various processes, including protein folding, enzymatic catalysis, and molecular recognition. Water does not simply fill up the available space around proteins, but occupies specific sites and forms localized clusters, determined by its hydrogen bonding capabilities. Distribution of water around proteins was analyzed in several early studies of protein crystal structure, which revealed its preference for Glu, Asp and Arg side-chains and main-chain carbonyl group [1]. However, due to limited number of high resolution structures, hydration patterns could be investigated for only a few amino acids, such as serine and threonine, in these early studies. We therefore decided to make use of the immense growth of the Protein Data Bank (PDB) [2] in the recent years and perform a detailed analysis of hydration patterns for all 20 standard amino acids using high-resolution protein crystal structures.

We used a set of 3845 PDB entries with resolution better than 1.8 Å, maximum R-factor value of 0.22 and mutual sequence identity of the chains of 50% or less. We checked the quality of all structures with Molprobity program and generated all the crystalograpic neighbours of the unit cell. The contacts of each amino acid residue with waters within 3.6 Angstroms were then detected. Residue conformations were clustered separately in each class defined by residue type, secondary structure (alpha helix/beta sheet) and chi1 rotameric state (g+/g-/t) using the quality threshold algorithm. The clusters of residues with the associated water molecules were then subjected to the method of density representation [3] in order to identify the most preferred location of water molecules. Briefly, a fourier transform technique was used to calculate structure factors from atom positions, and to convert them to electron densities. Water peaks were then detected and water positions, occpancies and B-factor refined using standard crystallographic procedures.

The result of our study is a detailed atlas of protein hydration, containig the most populated positions of waters around each residue type in various backbone and side chain conformational states. The analysis of high resolution crystallographic structures of proteins from the PDB database thus revealed the spatial distribution of the water molecules in the first hydration layer of proteins and to some extent also their dynamic properties, which can be estimated from their B-factors, a measure of atom’s vibrations. The hydrated amino acid rotamers obtainded from our study can be used in many areas of structural biology, from molecular replacement and crystalographic refinement, to the improvement of accuracy of ab initio protein structure prediction methods.

This work was supported by grant from BIOCEV – Biotechnology and Biomedicine Centre of Academy of Sciences and Charles University in Vestec, project supported from European Regional Development Fund.

[1] N. Thanki, J. M. Thornton, J. M. Goodfellow, J. Mol. Biol. 1988, 202, 637.

[2] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P. E. Bourne, Nucleic Acids Res. 2000, 28, 235.

[3] B. Schneider, D. M. Cohen, L. Schleifer, A. R. Srinivasan, W. K. Olson, H. M. Berman, Biophys. J. 1993, 65, 2291.