AUTOMATIC BUILDING AND REFINEMENT OF PROTEIN CRYSTAL STRUCTURES

Victor Lamzin1, Richard Morris1 and Anastassis Perrakis2

European Molecular Biology Laboratory (EMBL),
1Hamburg and
2Grenoble Outstation

Phase information in protein crystallography can be obtained by (i) experimental techniques such as multiple or single isomorphous replacement (MIR, SIR) with or without anomalous scattering information (MIRAS, SIRAS) or multiple (or even single) wavelength anomalous dispersion (MAD, SAD) combined with software to make optimal use of the data, e.g. Phases, MlPhare and, more recently, SHARP; (ii) purely computational techniques if data approaching atomic resolution (around 1.0 A) are available, as implemented in packages such as SHELX and Shake & Bake; (iii) a combination of experimental information about e.g. location of a heavy atom(s) in a metalloprotein and computational approaches for extending these phases. Given an electron density map that has been created with phases resulting from any of the above techniques, ARP/wARP then comes into play. The map can be described by a free atom model or a set of models. This is the first step in the ARP/wARP.

The Automated Refinement Procedure (ARP) principle is based on a cyclic procedure of fitting calculated to the observed structure factor amplitudes in reciprocal space followed by automatic permutation of an atomic model in real space. The latter is based on an analysis of electron density maps and employs several objective or empirical criteria. This way the unrestrained model is not only refined in reciprocal space to minimise typically a maximum likelihood residual, but at the same time continuously updated in order to uptake new features that may appear in the electron density. Real space model update helps considerably to escape from local minima. While this procedure works extremely well when high resolution data (about 1.5 A) are available, at lower resolution the paucity of observations starts adversely affecting the convergence. Refinement now requires additional information. For biological macromolecules this information is typically given as a set of a priori known stereochemical parameters used in a form of distance, angle, etc. restraints. While this is clearly necessary for refinement to proceed, it also results in a so-called model bias. The lower the observation to parameter ratio, the more pronounced is the bias, which becomes significant at resolution lower than 2.0 A.

We employ alternative means of overcoming the problem of limited data, which are realised in OweightedÓ ARP, wARP. The first concerns the weighted averaging of structure factors resulting from multiple ARP refinement of somewhat different models. The second applies a highly flexible model due to the continuously reassigned atomic connections and is based on the automatic density map interpretation and model building. wARP can result in substantial improvement of phases and in their extension to the highest resolution of the native diffraction data. Some parts of the map can be automatically recognised to contain elements of protein structure. Then it becomes possible to build at least partial atomic protein model. While with free atom models it is easy to interpret almost every feature of an electron density map the automated model building of parts offer the use of geometrical restraints in the reciprocal space minimisation steps. A combined partial protein model with a free atoms model is a better description of prominent features in the electron density. At this certain stage the model does not necessarily have to make perfect stereochemical sense but it is sufficiently reasonable to be used in the next ARP cycle. There the model is optimised and updated by ARP addition/deletion of both free and model atoms. This is iterated in the hope (which more than often comes true) that improved phases will allow construction of larger parts of the protein model.

Applying ARP/wARP we succeeded in fully automatic construction of almost complete models (more than 90 % of main chain and 50 % of side chain atoms) for protein structures where experimental phases were available from different techniques (but typically extended to not further than 2.5 A) and native diffraction data ranged from 0.9 to 2.0 A resolution. It was also possible to automatically build a partial model (about 50 %) with 2.15 A data and to deliver an improved map with 2.3 A data. The latter seems to be the current resolution limit for ARP/wARP.