Karel Bezouška1,2, Petr Man1,2, Petr Pompach1,2, Petr Novák1,3, Vladimír Havlíček2, Rolf Hilgenfeld4, Ondřej Plíhal1,2, Jan Sklenář1,2, Andrea Pišvejcová2, Vladimír Křen2
1Department of Biochemistry, Faculty of Science, Charles University Prague, Hlavova 8, CZ-12840 Praha 2, Czech Republic; 2Institute of Microbiology, Academy of Sciences of Czech Republic, Vídeňská 1083, CZ-14220 Praha 4, Czech Republic; 3Sandia National Laboratories, 7011 East Avenue, Livermore, CA, 94550, USA; 4Institute for Molecular Biotechmology,Beutenbergstrasse 11, D-07745 Jena, Germany. Correspondence to firstname.lastname@example.org
According to the textbook knowledge the use of automated Edman degradation and protein mass spectrometry for the determination of complete primary structure is limited to small proteins with molecular size less than 10 kDa, larger proteins being analyzed by DNA sequencing of the corresponding genes or cDNA clones. Here we present two examples of our recent work in which the sequence of rather large proteins has been determined completely or nearly completely by protein sequencing and mass spectrometry. The first example includes the pokeweed antiviral protein from Phytolacca acinosa (PAP-S) that belongs to the family of type-1 ribosome-inactivating proteins. The purified PAP-S proteins resolve on SDS electrophoresis into two closely related bands with Mr of about 29 kDa. Interestingly, the upper protein band, PAP-Sup has been shown to crystallize through carbohydrate-protein interactions based on a rare type of N-glycosylation, namely N-linked GlcNAc monosaccharide substitutions at the canonical Asn-Xxx-Ser/The sequons . The sequence of PAP-Sup is not known from the genetic data, but is essential for unambiguous solving of the crystal structure in positions that cannot be called directly from the electron density. We have thus determined the complete structure of PAP-Sup by Edman degradation of N-terminal and internal peptide sequences in combination with MALDI peptide mapping and tandem mass spectrometry using an ion trap. The complete sequence has 261 amino acids and includes three sites of the above N-glycosylation. The sequence coverage was 92 % by Edman degradation data, 93 % by peptide mapping and 90 % by tandem MS data. The second example is b-N-acetylhexosaminidase from Aspergillus oryzae CCF1066, a robust extracellular secreted enzyme used in enzymatic syntheses of oligosaccharides and biotechnology . This enzyme has 600 amino acids (including 6 cysteins and 6 sites of N-glycosylation), of which 466 has been verified by direct analysis of the protein (sequence coverage 77 %). Identification of large N-terminal segment in the protein proved difficult pointing to the fast cleavage of this protein segment. Enzyme is composed of cleaved signal peptide, the propeptide sequence involved in regulated secretion, the inactive zincin domain, and the catalytical domain belonging to family 20 of glycohydrolases.
Supported by grant from Ministry of Education of Czech Republic (MSM 113100001), by Institutional Research Concept No. A0Z5020903 for the Institute of Microbiology, and by grant from the Grant Agency of Czech Republic (203/01/1018).
 T. Hogg, I. Kuta-Smatanová, K. Bezouška, N. Ulbrich, R. Hilgenfeld, Acta Cryst., D58 (2002) 1734-1739.  Z. Huňková, V. Křen, M. Ščigelová, L. Weignerová., O. Scheel, J. Thiem, Biotechnol. Lett. 18 (1996) 725-730.