ATTEMPTS AT AB INITIO PHASING OF MACROMOLECULES AT 3A RESOLUTION AND PHASE EXTENSION USING MAXIMUM ENTROPY AND LIKELIHOOD

C.J.Gilmore, Wei Dong

Department of Chemistry University of Glasgow, Glasgow G12 8QQ, Scotland, UK

The minimal function and its variants have provided superb tools for solving structures with up to 1000 atoms in the asymmetric unit provided that complete data to atomic resolution (i.e. around 1-1.1A) are available. This is, of course, a problem since most protein data sets do not diffract to this resolution. We have been assessing the viability of the maximum entropy - likelihood method to see if useful phase information can be obtained ab initio from small macromolecules using data truncated to 3A, with error-correcting codes (eccs) used as a source of efficient phase permutation. The eccs used were taken from the Bricogne BUSTER [1] in particular the Nordström-Robinson (16, 256, 6) and the Golay [24, 12, 8] codes in their standard or punctured forms. For crambin, the origin was partially defined by fixing the phases of two centric reflections to generate the root node of the phasing tree. Fully fixing the origin, and defining the enantiomorph was carried out de facto by the process of tree building. For the second level, 13 reflections were given permuted phases via the Golay [24, 12, 8] code generating 4096 nodes; the top 8 based on likelihood were kept. One phase set in this group had a mean phase error of 37.5o and a map correlation coefficient of 0.54. The third level involved permuting the phase of 12 reflections via a Golay [23, 12, 7] ecc. Of the top 8 nodes there was a solution with a mean phase error of 46.8o and a correlation coefficient of 0.46. Finally, the top eight nodes were kept and 15 reflections were given permuted phases using the Golay [24, 12, 8] code. By this time the accumulation of phase errors means that the best mean phase error was 58.5o with a correlation coefficient of 0.33. A total of 42 reflections contributed to the error statistics. In all 1+4096+2x(8x4096) = 69,633 nodes were computed, but using a network of UNIX workstations this calculation took less than 72 hours in total [2].The use of bigger codes is also being assessed.

The same formalism can be used for phase extension at any resolution. In this case there is a large basis set, and instead of simply extrapolating from a low resolution to a higher one and thus ignoring the branching problem, eccs are used to permute some unknown phases at high resolution and the ME calculations use both sets of phase information.

Research funded by CCP4.

[1] G. Bricogne, Acta Cryst. D49 (1993), 37-60.
[2] C.J.Gilmore, W.Dong & G.Bricogne, Acta Cryst,Submitted.