THE INFORMATION CONTENT OF 3-D PROTEIN STRUCTURAL DATA.

K S Wilson

Department of Chemistry, University of York, Heslington, York YO1 5DD, England,
Email
keith@yorvic.york.ac.uk.

The rate at which 3-D structures of proteins are being determined has shown a dramatic and continued acceleration in recent years, with about 2000 X-ray and NMR structures being deposited each year in the Brookhaven Protein Data Bank. Many of these are still determined by experienced crystallographers, but the ever more automated and routine methods mean that an increasing number of biological scientists without specialist training in crystallography are now using protein crystallography as a tool. This demands that the deposited data should reflect the information on the accuracy with which the parameters define the structure, as the latter is intended to be used by a wide community.

The data deposited should include not only the derived model coordinates, but also the primary data: for crystallography that means the experimental structure factor amplitudes. Only then can the data base allow the accuracy to be validated and the structures be appropriately used. It is not sufficient that the coordinates themselves are internally self-consistent with respect to geometry and stereochemistry.

Questions relate to the best tools for validation.

As crystallographers, the importance of recording and using complete experimental data as accurately as possible is clearly paramount. The quality of the model can only reflect that of the data. For small molecule structures the inversion of the least-squares matrix provides an individual standard uncertainty for each atomic parameter. This is not presently possible for the majority of macromolecules. However some validation of the accuracy of individual parameters properly based on the available experimental data, is vital if the huge amount of biological structural information is to be exploited optimally.

The author is a member of an EC network concerned with coordinate validation. This involves laboratories in Brussels, Hamburg, Heidelberg, London, Uppsala, Utrecht and York. Some activities of the network will be described, with especial emphasis on the atomic resolution structures of proteins now being determined in a number of laboratories.