Hint domains superfamily
|
|
Inteins: |
|
|
Bacterial intein-like (BIL) domains
|
|
Protein-splicing scheme. A precursor protein
is shown on the right, with intein protein-splicing domain shown in red and the host protein flanks (exteins) shown in
blue. The intein protein-splicing domain
autocatalyzes its excision and the ligation of its two flanks. |
Most inteins have an endonuclease domain inserted in the protein splicing domain. The endonuclease activity of these inteins can mediate the specific transposition of their gene into unoccupied integration sites of intein-less homologs (homing).
Inteins have a diverse and sporadic distribution across species and proteins. They occur in all three domains of life but so far have been found in just (relatively) few species. Inteins are currently known in more then 50 types of proteins with diverse in functions. These proteins include metabolic enzymes, DNA and RNA polymerases, proteases, ribonucleotide reductases, and vacuolar-type ATPases. Inteins integration points also vary in structure in function. Their only apparent common feature is being in highly conserved protein motifs.
Intein protein domain family is part of the Hint superfamily, termed after the characteristic structure fold first identified in Hedgehogs and Intein protein domains (Hall et al. '97). Four characterized Hint domain families are currently known: Hog Hints, inteins, and two types of Bacterial intein-like (BIL) domains. Together with sharing the same structure fold and common sequence features, Hint domains have similar biochemical activities. The domains post-translationally process the proteins in which they are present by protein-splicing, self-cleavage or ligation activities.
This site mainly introduces inteins and some their sister families. It explores the relation between the activities of these domains, their sequence motifs, and protein structure. We also show how are these related to the different biological roles and evolution modes of inteins and intein-like domains. A database of inteins is maintained at New England Biolabs. Intein registry, publications, sequence search and information on their mechanism can be found there.
Intein proteins contain a number of conserved sequence motifs (blocks). The motifs can be grouped in three domains according to their location and inferred function. Intein structures show that the inteins protein-splicing and endonuclease active sites are formed from conserved motifs. The intein's domain organization, deduced by sequence analysis, exactly corresponds to the structural domains.
Domain structure of a typical intein |
The intein protein-splicing C-terminal (C) domain is composed of the two adjacent motifs in the C-terminal 25-40 aa (including the conserved aa immediately C' to the intein). Residues in these motifs are necessary for the catalyzing the next steps of protein splicing: the branch formation and its resolution.
Most (but not all) inteins also include a central endonuclease (EN) domain. The EN domain is usually of the LAGLIDADG (dodecapeptide) homing endonucleases type. Intein LAGLIDADG EN domains are characterized by 4 motifs that probably form the endonuclease active site (Duan et al. '97). An intein from the cyanobacteria Synechocystis species PCC6803 (Ssp gyrB) has a different type of endonuclease domain. In this intein the endonuclease domain contains an HNH motif. This motif is found in various homing and other endonucleases (Shub et al. '94, Gorbalenya '94).
The endonuclease domain is optional in inteins. Mutations in it affect the intein endonuclease activity but not the protein splicing activity, some inteins are missing this domain, and inteins were shown to protein splice with this domain removed (Chong and Xu '97, Derbyshire et al. '97).
Functional inteins with no EN domain (minimal inteins), the relation
of the protein splicing domain to other Hint domains and the presence of
different EN domains in inteins all indicate that the primeval inteins
had no EN domains. Different EN domains, perhaps from homing endonucleases,
and DNA binding domains invaded intein genes to form the typical present
day intein. Some present day minimal inteins clearly lost their EN domain
(such as Mxe_gyrA, see Telenti
et al. '97 and Klabunde
et al. '98) and some maybe never acquired one.
Inteins are found in all three domains of life: Archaea, Bacteria,
and . EukaryotesHowever their distribution is sporadic in species and in
hosts. Some species have no inteins, some just one and Methanococcus
jannaschii has nineteen. For species with completely sequenced genomes
like E.coli, M.jannaschii and S. cerevisiae we know
the total number of inteins in the strain sequenced. For other species
we can only estimate their number. Intein distribution seems most varied
in archaea. This table compares
the inteins found in archaea with fully sequenced genomes.
One major group of organisms where inteins are not known in is multicellular , eukaryotesboth metazoa and plants. The multicellular red alga Porphyra does contain an intein but in its chloroplast genome. The reasons for this absence are not clear. Inteins may yet be found in these organisms and only turn out to be scarcer or perhaps difficult to detect. It is interesting to note that some intein-containing organisms, such as Mycobacteria tuberculosis and the CIV virus, are intra-cellular pathogens of metazoa. Thus, the opportunity for intein invasion into animal genomes does exist ( more details).
Intein distribution May 2001. |
Some protein families, such as ribonucleotide reductases and archaeal DNA polymerase type B, are more prone to contain inteins. These proteins contain inteins in different organisms and in different integration sites. Some of the ribonucleotide reductases and most of the DNA polymerases with inteins contain more than one intein.
| Currently (June 2004) about 200 inteins are identified in more than100 different species and strains, at more than 50 various families of protein hosts (details here).
|
Inteins found at homologous integration sites are most probably homologous too. However, it is not clear in which cases this relation between the inteins is due to vertical transfer (the usual inheritance, from an organism to its progeny) or horizontal transfer (movement of DNA across species). Inteins at homologous integration sites are termed intein alleles.
| Inteins are known to integrate at more than 65 different sites. About half of these have two or more alleles (details here).
|
Additional information on inteins can be found at the pages listed at the top of this intein home page.