Predicting ion binding sites in proteins

C. P. Feidakis1, P. Škoda2, R. Krivák2, D. Hoksza2, M. Novotny1

1Department of Cell Biology, Faculty of Science, Charles University, Viničná 7, 128 00 Praha 2, Czech Republic

2Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Malostranské náměstí 25, 118 00 Praha 1, Czech Republic

feidakic@natur.cuni.cz

From oxygen delivery, to protein phosphorylation and pH maintenance, ions are heavily involved in a wide range of biological processes and they have also been a target of pharmacological studies1‑4. About one third of known proteins include at least one metal ion and many metalloproteins are found in humans5.

A number of experimental techniques are being used to identify metal ion binding sites in proteins; however, they are often tedious and time consuming, raising challenges in their wide application. UniProtKB contains over 500,000 annotated protein sequences while another 180 million sequences are pending annotation. Additionally, there is an abundance of protein structures solved for structural genomics projects that have no functional annotation. There is a dire need for computational methods that could process the bulk of accumulating data and produce meaningful annotations.

Several methods have been developed, typically training machine–learning algorithms in order to reveal patterns within protein–ion interactions and allow the prediction of ion binding sites within a given protein. Despite the remarkable progress in the field, and the high accuracy achieved by the top-of-the-line predictors, most of them seem to suffer from low sensitivity and low MCC values. Furthermore, the entirety of current predictors focusses on specific subsets of metal ions and even specific protein residues as possible binding candidates. Moreover, there are non-metal ions that are well-represented in the PDB but are not being considered in the ion binding prediction scheme. These are indications that the predictive capacity can be extended and improved upon.

Here, we are planning to cover all well-known metal ions, but also include non–metals and other previously neglected ions. In addition to the established prediction features that are found within the broader context of ligand binding site prediction, we want to explore properties that are distinctive within the protein–ion binding scheme such as electrostatics and coordination geometry6.

We are employing state-of-the-art machine learning algorithms P2Rank and PrankWeb7,8, which have emerged through our collaboration with David Hoksza’s group, and are adopted by PDBe‑KB9, and we are updating and optimizing them, for the purpose of ion binding site prediction in proteins. We assemble a robust dataset of 102,700 ion-binding protein structures, including 42,294 unique, non–redundant, ion–binding sites within those structures, and we create separate ion subsets on which machine-learning will be performed.

1.     Burnett, G. & Kennedy, E. P. J. Biol. Chem. 211, 969–980 (1954).

2.     Hsia, C. C. N. Engl. J. Med. 338, 239–247 (1998).

3.     Bonar, P. T. & Casey, J. R. Channels (Austin) 2, 337–345 (2008).

4.     Ndagi, U., Mhlongo, N. & Soliman, M. E. Drug Des Devel Ther 11, 599–616 (2017).

5.     Degtyarenko, K. Bioinformatics 16, 851–864 (2000).

6.     Jernigan, R., Raghunathan, G. & Bahar, I. Current Opinion in Structural Biology 4, 256–263 (1994).

7.     Jendele, L., Krivak, R., Skoda, P., Novotny, M. & Hoksza, D. Nucleic Acids Res 47, W345–W349 (2019).

8.     Krivák, R. & Hoksza, D. J Cheminform 10, (2018).

9.     PDBe-KB consortium. Nucleic Acids Res. (2019).