QSPR modeling – algorithms, challenges and IT solutions

 

O. Skřehota1, R. Svobodová Vařeková1, S. Geidl1, M. Kudera1, D. Sehnal1, C. M. Ionescu1, T. Bouchal, and J. Koča1

 

1 National Centre for Biomolecular Research, Faculty of Science, Masaryk University,

Kamenice 5, 625 00 Brno-Bohunice, Czech Republic

svobodova@chemi.muni.cz

 

Nowadays, a large amount of experimental and predicted data about the 3D structure of organic molecules and biomolecules is available. Advanced computational methods and high performance computers allow us to process these data and calculate descriptors – numerical values, which encode the structural characteristics of molecules. Hundreds if not thousands of molecular descriptors have been designed for various goals. One very useful application is to employ descriptors in Quantitative Structure-Property Relationship (QSPR) models for physicochemical properties (e.g. dissociation constants, partition coefficients, solubility, lipophilicity, etc.) prediction.

QSPR modeling has become very popular in chemical, biological and pharmaceutical research. However, the design of QSPR models for predicting many important physicochemical properties is still a topic of research. This is caused, among other things, by the fact that the process of QSPR model design and evaluation is relatively complicated. At the beginning of this process, one only has a rough suggestion of descriptors. One uses these first ideas and implements algorithms for processing molecular structures and for calculating the descriptors. Afterwards, an equation expressing the relation between the descriptors and the property in question must be formulated, i.e., the QSPR model has to be parameterized. And finally, one evaluates how accurately the model correlates with reference (mainly experimental) data. The results of this evaluation extend our knowledge and help us to correct the model (e.g. add or remove some descriptors). The procedure of improvement of a model can be repeated many times.

OSPR modeling covers many different areas of interest. Therefore currently available software packages (e.g. OCHEM [1] or Arguslab [2]) can read descriptors from input and create the QSPR model, but they can not calculate descriptors, and they are able to evaluate models only one by one, along with other limitations.

For this reason, we have developed a modular and easily extensible program, called QSPR Designer [3], which can read or calculate structural properties of atoms and bonds, employ them as QSPR descriptors, and evaluate relationships between the descriptors and the examined physicochemical property of the molecules in question. Furthermore, the software allows us to design and parameterize QSPR models, calculate physicochemical properties via the models, test the quality of the models, and provide graphs and tables summarizing the results.

The performance of the software is demonstrated by a case study on the prediction of pKa, which is one of the most challenging properties to calculate [4]. Using the QSPR Designer, we have successfully designed, evaluated, and compared a lot of different QSPR models for the prediction of pKa from charges.

 

 

1.     S. Novotarskyi, I. Sushko, R. Krner, A. P. Kumar, M. Rupp, V. V. Prokopenko, I. Tetko: OCHEM - on-line CHEmical database & modeling environment. Journal of Chemoinformatics, 2 (2010), P5.

2.     M. A. Thompson: ArgusLab 4.0, Planaria Software LLC (2008). Available from: http://www.arguslab.com.

3.     O Skřehota, R. Svobodová Vařeková, S. Geidl, M. Kudera, D. Sehnal, C.-M. Ionescu, J. Koča: QSPR Designer – a program to design and evaluate QSPR models. Case study on pKa prediction. 6th German Conference on Chemoinformatics, (2010).

4.     A. C. Lee, G. M. Crippen: Predicting pKa. J. Chem. Inf. Model., 49 (2009), 2013 -2033.