Analysis and prediction of protein solubility

K. Slanska1, C. P. S. Badenhorst2, M. Dörr2, U. T. Bornscheuer2, J. Damborsky1,3, Z. Prokop1,3

1Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, Bld. A13, 625 00 Brno, Czech Republic

3Department of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, Felix-Hausdorff-Str. 4, 17487 Greifswald, Germany

3International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic

Contact: slanska.katka.4@gmail.com

The low solubility and poor expression of recombinant proteins are often a bottleneck in different biotechnological and pharmaceutical applications as well as in fundamental research, where high concentrations of purified protein are often essential. Poorly soluble proteins tend to aggregate. The presence of protein aggregates is associated with various neurodegenerative diseases. Therefore, understanding of the basis of protein solubility and expressibility and development of methods for their improvement are of great interest in many areas of research and development, including structural biology and biochemistry, industrial biotechnology and medicine.

An increase in protein’s soluble expression can often be achieved using conventional methods, e.g., optimization of the host organism, modification of growth media, and expression at low temperature. An alternative approach is to modify the protein’s amino acid sequence. Efforts have been made to apply artificial intelligence (machine learning) for predicting solubilizing mutations based only on the protein's amino acid sequence. However, currently available prediction tools are not very accurate and reliable [1]. The quality of machine learning predictions strongly depends on the size and quality of the experimentally acquired training data sets [2]. In this work, we introduce methods for high-throughput analysis of protein solubility, which are subsequently applied for the assessment of the effects of mutations on protein solubility. We adopted two general approaches, the split-GFP [3] and split-NanoLuc [4] technologies, and tested them in different experimental formats. The split-GFP system was used in combination with colony filtration using an immobilized bead assay [5], while the split-NanoLuc complementation approach was tested in a microtiter plate-based assay. Both methods are evaluated using a set of model proteins of varying soluble expression to assess their sensitivity, reliability and screening efficiency. The methods can provide the capacity to screen libraries of approximately 104-105 protein variants per screening campaign. The data collected will serve as a training set for the development of novel tools for the prediction of protein solubility.

[1] M. Musil, H. Konegger, J. Hon, D. Bednar, J. Damborsky, ACS Catal., 9, (2019), 1033-1054.

[2] S. Mazurenko, Z. Prokop, J. Damborsky, ACS Catal., 10, (2019), 1210-1223.

[3] S. Cabantous, T. C. Terwilliger, G. S. Waldo, Nat. Biotechnol., 23.1, (2005), 102-107.

[4] A. S. Dixon, M. K. Schwinn, M. P. Hall, K. Zimmerman, P. Otto, T. H. Lubben, B. L. Butler, B. F. Binkowski, T. Machleidt, T. A. Kirkland, M. G. Wood, C. T. Eggers, L. P. Encell, K. V. Wood, ACS Chem. Biol., 11, (2016), 400-408.

[5] M. A. Lockard, P. Listwan, J-D. Pedelacq, S. Cabantous, H. B. Nguyen, T. C. Terwilliger, G. S. Waldo, Protein Eng. Des. Sel., 24.7, (2011), 565-578.

This project is supported by the Masaryk University grant MUNI/C/1647/2019.