The protein sequences found in nature
represent a tiny fraction of the potential sequences that could be constructed
from the 20-amino-acid alphabet. To help define the properties that shaped
proteins to stand out from the space of possible alternatives, we conducted a
systematic computational and experimental exploration of random (unevolved)
sequences in comparison with biological proteins. In our study, combinations of
secondary structure, disorder, and aggregation predictions are accompanied by
experimental characterization of selected proteins.
We found that the overall secondary
structure and physicochemical properties of random and biological sequences are
very similar. Moreover, random sequences can be well-tolerated by living cells.
Contrary to early hypotheses about the toxicity of random and disordered
proteins, we found that random sequences with high disorder have low
aggregation propensity (unlike random sequences with high structural content)
and were particularly well-tolerated. This direct structure content/aggregation
propensity dependence differentiates random and biological proteins.
Our study indicates that while random
sequences can be both structured and disordered, the properties of the latter
make them better suited as progenitors (in both in vivo and in vitro settings)
for further evolution of complex, soluble, three-dimensional scaffolds that can
perform specific biochemical tasks.