Intein structure and motifs

The Mxe gyrA intein structure (1AM2; Klabunde et al., Nature Structural Biology, 5:31-37 '98) is mainly made up of beta sheets (colored in yellow). The protein splicing active site include residues Ser1, Thr72, His75 and Asn198 (colored in red). A loop region of the protein (aa 112-129) was found to be disordered and is therefore missing from the structure. This unordered part comprises most of the region where the endnuclease domain is found in other inteins (aa 107-160). The crystalized parts of this region are colored in cyan. The C-terminal residues of this region (aprroximately aa 142-160) seem to be remenants of an endonuclease domain. All other known alleles of this intein have a DOD type endonuclease domain and are very similar by sequence to the protein splicing domain of this intein. Thus, this intein is likely to have lost its endonuclease domain at some point in its evolution.

This intein structure is missing its C-extein and a single alanine (Ala0) in its N-end models the N-extein (in blue). To trap the intein in its pre-cleaved state the naturally occurring Cys residue in its N-end was substituted by a Ser. The bond between the N-extein (Ala0) and the intein (Ser1) is the one cleaved in the unmodified protein. It first undergoes an acyl shift to an ester bond and then is joined to the C-extein by transesterification initiated by Thr199 (that aa is not present in this structure). This bond was found to be in a highly strained cis conformation. It might be that protein splicing is partially driven by the potential energy in this strained bond.
On   X Off   show a close up of the bond between the N-extein and the intein ( still image).

This structure and the one of the Sce VMA intein (also called PI Sce endonuclease) ( Duan et al. Cell 89:555-564 '97; 1VDE, structure to be publicly released in April 98) verify the relation between the inteins conserved motifs and the protein splicing and endonuclease active sites ( Pietrokovski, Protein Science 3:2340-2350 '94). Motifs N1 (1-13), N3 (66-79), C2 (175-189) and C1 (191-199) of the Mxe gyrA intein form its active site in the core of the structure:
On   Off   show structure position of motifs N1, N3, C2 and C1 ( still image).

The Mxe gyrA intein does not have an endonuclease domain but in the Sce VMA intein structure the endonuclease domain includes all of the typical intein endonuclease motifs (EN1-4). Moreover, the first three of these motifs form the endonuclease active site ( Duan '97). You can see this structure and its analysis in this page.

Two additional motifs, N2 and N4 were suggested to be the result an ancient duplication event and be part of the protein splicing active site ( Pietrokovski, Protein Science 7:64-71 '98). Only the first assertion was shown to be true - these motifs and the regions around them are structurally similar to each other and the whole intein structure shows a two fold symmetry. However, motifs N2 (16-23) and N4 (91-106) are not part of the protein splicing active site found in the core of the structure. They are found on periferal beta strands and might stablize the structure:
On   Off   show structure position of motifs N2 and N4 ( still image).

Hedgehog developmetal proteins and other protein familes found in C.elegans have a common domain in their C-terminal end. This domain was shown to undergo autoproteolysis, cleaving itself off the N-terminal part of the protein and modulating its activity (developmental regulation in the hedgehog proteins). The cleavage mechanism of this C-terminal autocatalytic domain (CAD) is similar to the N-terminal cleavage of inteins - a Cys/Ser peptide bond at the N-terminal end of the domain is shifted to an ester bond that is cleaved by a nucleophilic attack (Porter et al., Cell 86:21-34 '96 ). The hedgehog N-terminal motif and a conserved His motif were shown to be similar to corresponding intein motifs (N1 and N3) (Koonin, TIBS 20:141-142 '95, Burglin, Curr Biol 6:1047-1050 '96). Later examinations revealed further similarity between the two families indicating a common origin ( Dalgaard et al., J Comput Biol 4:193-214 '97, Pietrokovski '98). CADs have a pair of motifs similar to the intein N2 and N4 motifs and to each other. This indicates that CADs too are the result of a duplication event and that the duplication was already present in the intein and CAD common ancestor (the "hint" domain). Structure determination of the gyrA intein and the Drosophila melanogaster hedgehog CAD ( Hall et al., Cell 91:85-97, '97) confirmed these sequence analysis predictions.
More information on the structure of CAD and its similarity to the intein structure can be found here.


[Inteins home page]
Page last modified July 2001
Shmuel Pietrokovski <pietro@weizmann.ac.il>