Our work focused on elucidating the selection rules governing the specificity of protein – DNA interactions. From a non-redundant set of protein – DNA complexes, we isolated all nucleotide – amino acid dimers. This provided a set of contacts mapping the positions occupied by each amino acid around each nucleotide thorough the entire Protein Data Bank. Considering the lack of distinguishing marks on the DNA sugar-phosphate backbone, we further reduced the problem of DNA sequence recognition to interactions between the DNA bases and amino acid side chains.
The distributions of each of the 20 standard amino acids around each DNA base served as a basis for our calculations. In these distributions, we recognised spatially defined clusters as areas where each particular amino acid was more prone to occur. Furthermore, for each cluster a single contact was selected as a representative based on statistical scoring.
The interaction energy of each cluster representative was calculated by a series of commonly used empirical force fields. These interaction energies, representing statistically significant contacts, were mapped against the energy distributions corresponding to the clusters they belonged to, as well as to the interaction energy profiles of the entire distributions. The validity of our results is supported by ab initio calculations performed on the same set of structures.
We were able to find that for certain DNA base – amino acid pairs, significantly stabilising interaction energies could be achieved only within a rather limiting set of mutual orientations of the interacting partners. An online repository providing access to the results in graphical form was established.