QSPR modeling – algorithms, challenges and IT solutions

__O. Skřehota__^{1}, R. Svobodová Vařeková^{1}, ^{1}^{1}, D. Sehnal^{1},
C. M. Ionescu^{1}, T. Bouchal, and J. Koča^{1}

^{1}
National Centre for Biomolecular Research, Faculty of Science,

Kamenice 5, 625 00 Brno-Bohunice, Czech
Republic

svobodova@chemi.muni.cz

Nowadays, a large amount of experimental
and predicted data about the 3D structure of organic molecules and biomolecules
is available. Advanced computational methods and high performance computers
allow us to process these data and calculate descriptors – numerical values,
which encode the structural characteristics of molecules. Hundreds if not
thousands of molecular descriptors have been designed for various goals. One
very useful application is to employ descriptors in Quantitative
Structure-Property Relationship (QSPR) models for physicochemical properties
(e.g. dissociation constants, partition coefficients, solubility,
lipophilicity, etc.) prediction.

QSPR modeling has become very popular in
chemical, biological and pharmaceutical research. However, the design of QSPR
models for predicting many important physicochemical properties is still a
topic of research. This is caused, among other things, by the fact that the
process of QSPR model design and evaluation is relatively complicated. At the
beginning of this process, one only has a rough suggestion of descriptors. One
uses these first ideas and implements algorithms for processing molecular
structures and for calculating the descriptors. Afterwards, an equation
expressing the relation between the descriptors and the property in question
must be formulated, i.e., the QSPR model has to be parameterized. And finally,
one evaluates how accurately the model correlates with reference (mainly
experimental) data. The results of this evaluation extend our knowledge and
help us to correct the model (e.g. add or remove some descriptors). The
procedure of improvement of a model can be repeated many times.

OSPR modeling covers many different areas
of interest. Therefore currently available software packages (e.g. OCHEM [1] or
Arguslab [2]) can read descriptors from input and create the QSPR model, but
they can not calculate descriptors, and they are able to evaluate models only
one by one, along with other limitations.

For this reason, we have developed a
modular and easily extensible program, called QSPR Designer [3], which can
read or calculate structural properties of atoms and bonds, employ them as QSPR
descriptors, and evaluate relationships between the descriptors and the examined
physicochemical property of the molecules in question. Furthermore, the software
allows us to design and parameterize QSPR models, calculate physicochemical
properties via the models, test the quality of the models, and provide graphs
and tables summarizing the results.

The performance of the software is
demonstrated by a case study on the prediction of *pK _{a}*, which is one of the most challenging properties to
calculate [4]. Using the QSPR Designer, we have successfully designed,
evaluated, and compared a lot of different QSPR models for the prediction of

1. S.
Novotarskyi, I. Sushko, R. Krner, A. P. Kumar, M. Rupp, V. V. Prokopenko, I.
Tetko: OCHEM - on-line CHEmical database & modeling environment. Journal of
Chemoinformatics, 2 (2010), P5.

2. M.
A. Thompson: ArgusLab 4.0, Planaria Software LLC (2008). Available from: http://www.arguslab.com.

3. O
Skřehota, R. Svobodová Vařeková, S. Geidl, M. Kudera, D. Sehnal, C.-M. Ionescu,
J. Koča: QSPR Designer – a program to design and evaluate QSPR models.
Case study on *pK _{a}*
prediction. 6th German Conference on Chemoinformatics, (2010).

4. A.
C. Lee, G. M. Crippen: Predicting *pK _{a}*.
J. Chem. Inf. Model., 49 (2009), 2013 -2033.