HIPHOP REFINEMENT OF PROTEIN STRUCTURES

J. Ondráček

Department of
Recombinant Expression and Structural Biology, Institute of Molecular Genetics,
Academy of Sciences of the Czech Republic, Flemingovo n. 2, CZ-16637 Praha 6, Czech Republic

During the refinement process last structure details are modeled and structure parameters minimized. The refinement process is usually stopped in the minimum on the refinement curve. Every possible model has its own refinement curve and every refinement curve is in fact a function of the Fourier transform of the X-ray diffraction data. Only one electron density corresponds to our request on the quality of the final model.

For proteins, usually only limited resolution data are available and the Fourier transform of such data is poor due to low number of Fourier coefficients in comparison to those of full resolution small molecule structures. Minima on the optimal protein refinement curve are not so frequent and deep. This is why reliability factors for proteins must be higher then those for small molecules. Furthermore, it is impossible to distinguish the global minimum between many local ones.

During the refinement process the refinement curves can lie above or bellow the optimal refinement curve. The refinement curve will lie above in the first steps of the model buildings and refinements or when the resolutions are increased during the refinements. In these cases the models are under-parameterized. During refinements, values of reliability factors must decrease. Reverse situation is when higher resolution model is used as the initial model or when the resolution during the refinement is decreased. In these cases models are over-parameterized (over-determined or over-refined) and during the rebuilding and refinement the number of refined parameters must be reduced and thus, reliability factors must increase.

The power of the refinement method used depends on its possibility to reach the optimal refinement curve and to determine the deepest minimum on it. Usually, the refinement process is not able to overcome higher barriers on the refinement curve if no significant attempts of the model improvement are made (the model is only little over-parameterized) and the new refined model is very similar to the old one. Then, the local minimum reached is very close to previous one. When huge structural change on the model is made the refinement process is able to overcome huge barriers on the refinement curve and the radius, in which the refinement method used is able to reach the best minimum, increases.

The HipHop refinement is based on the repeating of the huge structural changes and refinements followed by several structure reducing and refinement cycles. This is repeated until the values of reliability factors and water content are stable within statistical variances. The result of the HipHop refinement is not one single model of electron density (as usual) but a set of possible solutions in local minima corresponding to a set of possible electron densities.

One HipHop step usually consists of one Hip and several Hop steps. Every Hip/Hop step is followed by the refinement.

The Hip (excitation) step is carried out by adding of proper number of waters corresponding to the maxima in the difference Fourier map. Suitable number of waters is usually ~ 15 % of non-hydrogen protein atoms with the occupancy 0.5 and thermal parameter U = 1.2 for Shelxh or B=30 for Refmac5 version. So the higher the number of waters added do the model is and the lower their thermal parameters are the higher the radius for the location of a minimum is. On the other hand this is limited by the refinement stability. By the use of the parameter described the phase change is usually ~ 1 %. The model is in this way over-parameterized due to new possible water positions and during the refinement cycles the new model with new main/side chain orientation and new set of water molecules is formed. Shifts of water positions in first refinement steps are about 2-3 Å

In the Hop (reduction) step wrong waters are removed from the model. The Hop step is repeated usually five times and in every step the minimal electron density limit is increased five times. Water is considered to be wrong when 1) the calculated electron density in the water position is lower then the limit given for the step, 2) the water does not have the ball shape and 3) the water is too close to the protein molecule.

One run of HipHop usually consists of ten HipHop steps. After this, the stability of reliability parameters, number of water molecules in the model, and the agreement of the electron density with the model are evaluated. If necessary, the model improvement is done manually and the HipHop run is repeated until the reliability parameters and number of waters is stable and no possible structure improvement appears.

The final stability of the HipHop refinement is
the proof of the correctness of the method used for the refinement. The HipHop
refinement yields classical R and R_{free} factors. Except those, it is
useful to define and calculate the Refinement Reliability Factor R_{rrf}.
This is defined in the same way as R_{free} with the exception that the
reflections used for R_{rrf} calculation can be used in previous
refinement steps. Final average value of R_{rrf} is usually similar to
R_{free} . The exclusivity of reflections used for the R_{free}
calculation is substituted by the statistical evaluation of R_{rrf}s by
the calculation of its final value by the use of phase average after HipHop
refinement.

During the tests of HipHop refinement method on several protein X-ray data no one unique solution which would have statistically better reliability factors than the rest of possible solutions was found. Structural variances yielded by HipHop refinement correspond to the resolution and the quality of the X-ray data.