KNOWLEDGE-BASED LIBRARIES DERIVED FROM THE CAMBRIDGE STRUCTURAL DATABASE

Vanessa J. Hoy

Cambridge Crystallographic Data Centre 12 Union Road, Cambridge CB2 1EZ, England

Keywords: Cambridge Structural Database Knowledge-Based Libraries Intermolecular Interactions Torsional Distributions Conformational Preferences Knowledge-Based Software

The Cambridge Structural Database (CSD)[1] is a fully retrospective computerised archive of bibliographic, chemical and numerical data from X-ray and neutron diffraction studies of organic and metallo-organic compounds. The most recent (April 1998) release of the CSD contains over 180,000 structure determinations and the database continues to grow by around 10% each year.

The bulk of the data stored in the CSD are primary numerical results: fractional coordinates, cell dimensions, space group information etc. However, CSD users are usually interested in the structural KNOWLEDGE - bond lengths, interbond angles, torsions, intermoleuclar contact geometries, etc. - that can be derived from this raw data.

Software supplied with the CSD [1,2] allows experienced users to perform such data mining or knowledge engineering experiments. Nevertheless, the vastness of the database and the complexity of the data mean that even simple studies are often lengthy and require considerable skill.

In order to help scientists overcome these barriers the CCDC recently began working on a project to generate libraries of structural knowledge derived from the raw data stored in the CSD. The aim of the project is to produce a series of knowledge-bases which present key information in an easily accessible and digestable form.

This presentation will describe the knowledge-based libraries derived from the CSD that are now available as well as those that are still being developed. These include the IsoStar [3] library of intermolecular interactions, and a developing library of torsional distributions. Knowledge that is expressed in a consistent electronic format can then be manipulated by a variety of applications programs. Two such programs, SuperStar and GOLD which make use of data from both libraries, will be described. Other scientific applications of structural knowledge bases will also be discussed.

  1. 1. Allen, F.H., Davies, J.E., Galloy, J.J., Johnson, O., Kennard, O., Macrae, C.F., Mitchell, E.M., Mitchell, G.F., Smith, J.M. and Watson, D.G., J. Chem. Inf. Comput. Sci., 31 (1991) 187 -204.
  2. 2. Allen, F.H. and Kennard, O., Chem. Design Automation News, 8 (1993) 1 and 31 -37.
  3. 3. Bruno, I.J., Cole, J.C., Lommerse, J.P.M., Rowland, R.S., Taylor, R. and Verdonk, M.L., J. Computer-Aided Molec. Design, 11 (1997) 525.