Intein motifs

Motifs are shown as sequence logos.

N-terminal (N) domain

intein motif N1 <----->
1-33 aa
intein motif N2 <------>
16-75 aa
intein motif N3 <----->
3-37 aa
intein motif N4

N1

N2

N3

N4

The first motif is found at the N-termini of inteins (A in Pietrokovski 94' and Perler et al. 97'). Note the conserved Ser and Cys at the N' intein splice site. The OH/SH side groups of the aa in this position are necessary for the N-O/S shift of the peptide bond to the N-extein. The His at position 10 of the third motif is the most conserved intein residue. It was first predicted (Pietrokovski 94') and then shown (Kawasaki et al. 97') to be involved in the protein splicing reaction. Intein structures (Duan et al. 97', Klabunde et al. '98) also showed this residue to be positioned at the protein splicing active site.

C-terminal (C) domain

intein motif C2 <------>
-1-9 aa
intein motif C1

C2

C1

The two C domain motifs (F and G in Pietrokovski 94' and Perler et al. 97') are involved in the final steps of protein splicing. A Ser, Thr or Cys are found at the N-terminus of the C-extein (last position of second motif). The hydroxyl, or thiol, group of this aa attacks the last aa of the N-extein in a transesterification reaction. The resulting branched intermediate is resolved by the cyclization of the Asn preceding the attacking aa. This Asn, the intein's C-termini, is the second most conserved residue in inteins. Gln residues are now also known to occur in this position. They are found in the CIV RIR1 and Pho polC inteins. These inteins are integrated in conserved protein regions of vital proteins (subunits of ribonucleotide reductase and replicative DNA polymerases). Hence it is likely that these inteins are active and capable of protein splicing. The splicing reaction is suggested to be a variation of the Asn cyclization in which the Gln will undergo cyclization to glutarimide ring.

Intein endonuclease (EN) domains

DOD domain

intein motif EN1 <------->
53-106 aa
intein motif EN2 <----->
4-18 aa
intein motif EN3 <----->
0-23 aa
intein motif EN4

EN1

EN2

EN3

EN4

The first and third EN motifs (C and E in Pietrokovski 94' and Perler et al. 97') are the DOD motifs found in the DOD homing endonucleases (Mueller et al. 94'). The two motifs are similar to each other and probably have similar roles. Protein structure of the yeast Sce VMA intein ( Duan et al. 97') and of a DOD type endonuclease ( Heath et al. 97') showed the motifs to be alpha helices holding together the two halves of the protein and also forming the endonuclease's active site. The Sce VMA structure also showed the conserved basic residue in the second position of the second EN motif (motif D in Pietrokovski 94' and Perler et al. 97') to be another part of the active site. Mutating DOD motifs in the Tli pol2 (Hodges et al. 92') and Sce VMA (Gimble and Stephens 95') inteins abolished their endonuclease activity. However, the protein splicing activity of Tli pol2 was not affected by the mutation. Genetically engineered Sce VMA (Chong and Xu 97') and Mtu recA (Derbyshire et al. 97') inteins lacking the EN domain were both shown to protein splice. The EN domain is also naturally missing from various inteins. All this clearly shows that the endonuclease domain is optional and not crucial for intein splicing.

HNH domain

motif HNH A <----->
2-10 aa
motif HNH B
The HNH motif is found in bacterial and organellar endonucleases occurring as independent genes and inside group I and II introns (Shub et al. 94', Gorbalenya 94').

The endonuclease domain is optional in inteins. Mutations in it affect the intein endonuclease activity but not the protein splicing activity, some inteins are missing this domain, and inteins were shown to protein splice without this domain (Chong and Xu 97', Derbyshire et al. 97'). The role of the EN domain is to enable inteins to horizontally transfer to unoccupied intein integration-sites by a process termed homing. This process was first studied in group-I introns that code for proteins called homing endonucleases. It proceeds in the same way in both introns and inteins and will be described here for inteins. The cleavage site of the inteins endonuclease domain is made up by the two flanks of the integration sites in their host gene. This was experimentally verified for many of these group-I encoded endonucleases and some inteins. The target sites are very long relative to restriction endonuclease spanning 12-40 bp. This usually assures that only such site will be present in the genome - in an unoccupied intein-host gene. In order for homing to occur the DNA of an intein containing gene must be present in a cell with an intein-less allele of this gene. This can happen in sexual mating in eukaryotes or when a bacterium or archaeon ingest or exchange DNA. The intein would be transfered with the DNA or transcribed and translated from it. It will then proceed to cleave the intein-less allele of its host. If the resulting double-strand break will be repaired by ligation of its ends it will be cleaved again. However, the break can also be repaired using the intein+ allele as a template. In this case the repaired gene will now include the intein region in exactly the same spot as the intein+ allele. This gene conversion process is called homing since the transfered element (intein or intron) can only move to homologous unoccupied sites of its integration point. There is a possibility that the cleavage site will be lost due to mutations or errors in repair of the double strand break. In such cases that site will be immune to cleavage by that intein. Indeed, inteins and group-I introns are found integrated in highly conserved sites where changes are unlikely and usually deleterious.


The positions of the motifs are conserved in different inteins and relative to each other. This can be seen in the inteins motif map. Intein structures show that the motifs have important functional and structural roles, forming the protein splicing and endonuclease active sites.

Motif designations
Pietrokovski '97 Perler '97 & Pietrokovski '94 Other names
N1 A Inteins N-terminal splicing point
N2 - -
N3 B -
N4 - -
EN1 C DOD, dodecapeptide, LAGLIDADG, P1
EN2 D -
EN3 E DOD, dodecapeptide, LAGLIDADG, P2
EN4 H -
HNH - I-TevIII family motif
C2 F -
C1 G Inteins C-terminal splicing point


[Inteins home page]
Page last modified July 1998
Shmuel Pietrokovski <pietro@weizmann.ac.il>