Search into unevolved protein space

Tretyachenko V1,2, Vymětal J1,2, Bednárová L2, Vondrášek J2, Hlouchová K1,2

1Department of Biochemistry, Faculty of Science, Charles University, Hlavova 2030, 128 00 Prague 2, Czech Republic

2Institute of Organic Chemistry and Biochemistry, The Czech Academy of Sciences, Flemingovo náměstí 2, 166 10 Prague 6, Czech Republic 

 

The protein sequences found in nature represent a tiny fraction of the potential sequences that could be constructed from the 20-amino-acid alphabet. To help define the properties that shaped proteins to stand out from the space of possible alternatives, we conducted a systematic computational and experimental exploration of random (unevolved) sequences in comparison with biological proteins. In our study, combinations of secondary structure, disorder, and aggregation predictions are accompanied by experimental characterization of selected proteins.

We found that the overall secondary structure and physicochemical properties of random and biological sequences are very similar. Moreover, random sequences can be well-tolerated by living cells. Contrary to early hypotheses about the toxicity of random and disordered proteins, we found that random sequences with high disorder have low aggregation propensity (unlike random sequences with high structural content) and were particularly well-tolerated. This direct structure content/aggregation propensity dependence differentiates random and biological proteins.

Our study indicates that while random sequences can be both structured and disordered, the properties of the latter make them better suited as progenitors (in both in vivo and in vitro settings) for further evolution of complex, soluble, three-dimensional scaffolds that can perform specific biochemical tasks.