Group I intron sequence and structure Database (GISSD) is a specialized and comprehensive database for group I introns, focusing on integrating useful group I intron information from all available databases. GISSD also wants to provide de novo data essential for understanding group I introns at a systematic level. It aims to provide a consensus structure for each subgroup of group I introns based on high quality alignments, to judge the confidence of the group I introns annotated by Rfam (http://www.sanger.ac.uk/Software/Rfam/), to classify Rfam group I introns into subgroups based on the consensus structures, and to provide intron number-containing taxonomy tree based on the taxonomy information of the host organisms of all group I introns.
Currently, GISSD presents 1789 intron complete records, including the nucleotide sequence of each annotated intron plus 15 nt of the upstream and downstream exons, as well as the pseudoknots-containing secondary structure predicted by integrating comparative sequence analysis and minimal free energy algorithms. These introns represent all 13 known minor subgroups and an undifferentiated major subgroup, with their structure-based alignments being separately provided. Both structure predictions and alignments were done manually and iteratively adjusted, which yielded reliable consensus structure for each subgroup allowing us to judge the confidence of 20,085 group I introns previously predicted by INFERNAL program (http://infernal.janelia.org/), and to classify these large amount of introns into subgroups automatically. The database provides the intron-associated taxonomy information from GenBank, allowing one to detailedly view the distribution of all group I introns. CDSs residing in introns and 3-D structure information are also integrated if available. A total number of 16914 group I introns were validated, with 95.5% of them being classified into IC3 subgroup and 96.4% residing in viridiplantae, suggesting that the major reservoir of group I introns in nature is the chloroplast tRNALeu genes.
|The advantages of GISSD
- Providing group I intron sequences
In the CRW site (Cannone, et al. 2002), although the
GenBank Accession Number of the gene that contains individual
intron is given, no intron sequence is provided. Sometimes,
the intron annotation in the corresponding GenBank records is
not available or clear. To systematically analyze the
phylogeny and structure of group I introns requires the
immediate availability of each intron sequence. The 5' and 3'
exon sequences adjacent to the intron are required to fold the
P1 and P10 structures of the intron. To meet this requirement,
GISSD assigns each sequence to each intron and also provides
15 nt of the adjacent upstream and downstream exons of the
- Providing reliabe structures and alignments of group I introns
Group I introns are highly diverse in their sizes ranging from less than 200 nt to over 4000 nt. These introns are also featured by weak sequence similarity and contain two long-range pseudoknots. These features make it difficult for current computational programs to automatedly generate the reliable secondary structure and alignments of group I introns. CITRON (Lisacek et al.,
1994) and Infernal (Griffiths-Jones et al., 2003) have been successfully used to identify group I introns from genome sequences. However, the sensitivity and specificity of both programs are not satisfied for introns whose characteristics are not integrated into the program or not included in the training dataset. Furthermore, identification of an intron doesn't mean that the structure of the intron is well determined. We have used manual methods to provide reliable detailed secondary structures and alignments for near 1800 group I introns belonging to 13 subgroups, and made each of them available on GISSD. These data could be used to build better models or as a more complete training datasets to improve the future search of new group I introns or prediction of the intron structures.
In general, phylogenetic trees of group I introns are
inferred by inputting the core region alignments to phylogeny
programs. The reliability of the alignment would affect the
results and reasoning of the phylogeny analysis. On the other
hand, structure comparison can reveal non-canonical structural
interactions. Reliable secondary structure for all the
subgroups also allows depicting the structural function of
We deduced a consensus structure for each of the 14 subgroups according to the alignments of the 1789 group I introns. The consensus structures were then processed by program 'cmbuild' in INFERNAL package and we got 14 Covariance Models (CM is a type of profile stochastic context-free grammar (profile SCFG)) (Eddy and Durbin, 2002). These CMs could be used to automatically search group I introns, to predict secondary structures (rough structures needed to be processed and validated), and to classify introns into subgroups.
- Providing intron number-containing taxonomy tree
In the Distribution page, an intron number-containing taxonomy tree allows a user to display the desired level of taxonomy nodes. In each taxonomy node, the level, taxonomy name and rank, and the number of introns are shown. The user could view the distribution of introns easily.
- Providing ORF information
Homing endonuclease genes (HEGs) that invade non-critical
regions (i.e. terminal loops) of group I intron promote intron
mobility by encoding highly site-specific homing endonucleases
(HEs) (Haugen et al., 2005). HEGs reside in introns have
relation with the mobility intron insertion, splicing and
spread. Annotations of the ORFs in the introns would benefit
the study of the history/origin of group I introns.
Group I introns are widespread non-coding RNA sequences found in nuclear, chloroplast, and mitochondrial genomes of eukaryotes, in bacterial archaebacterial and eubacterial genomes, and even in some viral genomes. They are well known for self-splicing of their own from the host precursor RNA via two transesterification
reactions, and therefore called group I ribozymes (Cech,
1990). Comparative sequence analysis reveals a common secondary
structure of all group I introns that consists conserved base paired
elements designed as P1-P9 (Burke, et al., 1987; Michel and Westhof,
1990; Li and Zhang, 2005). However, but P2 was latter found not
present in some group I introns, and P10 containing the 3' splice
site was found to be a quite conserved structure (Michel and
Westhof, 1990; Li and Zhang, 2005).
Structural and Biochemical study has revealed that the active
structure of group I ribozyme is assembled by two separable domains,
i.e. P4-P6 domain containing P4, P5 and P6 and P3-P9 domain containing
P3, P7, P8 and P9 (Michel and Westhof, 1990; Tanner, et al., 1997a and
1997b; Golden, et al., 1998; Woodson 2005). In addition to these
conserved core structural helices, group I introns have at least one
additional peripheral base-paired structure, such as P2.1, P5abc, P9.1
and P9.2 (Michel and Westhof, 1990). Peripheral elements establish a
variety of tertiary interactions that play important and sometimes
essential roles in organizing the P4-P6 and P3-P9 domains to the
compact active structure (Doherty and Doudna, 2001; Xiao et al.,
The atomic structures of three group I introns have been
resolved recently (Adams et al., 2004; Guo et al., 2004 and
Golden et al., 2005), which provide tremendous information to
understand the catalytic mechanisms of group I introns and the
roles of metal ions (Stahley and Strobel, 2005; Stahley and Strobel,
2006). The crystal structures also provide insights to how
intron-specific set of long-range interactions established by
peripheral interactions contribute to stabilize the core
structure (Vicens and Cech, 2006).
Splicing of group I introns is processed by two sequential
ester-transfer reactions. The exogenous guanosine or guanosine
nucleotide (exoG) first docks onto the active G-binding site located
in P7, and its 3'-OH is aligned to attack the phosphorester bond at
the 5' splice site located in P1, resulting in a free 3'-OH group at
the upstream exon and the exoG being attached to the 5' end of the
intron. Then the terminal G (omega G) of the intron swaps the exoG and
occupy the G-binding site to organize the second ester-transfer
reaction, the 3' OH group of the upstream exon in P1 is aligned to
attacks the 3' splice site in P10, leading to the ligation the
adjacent upstream and downstream exons and free of the catalytic
Following its excision from the pre-rRNA, group I intron undergoes an intramolecular cyclization reaction. This reaction is also self-catalyzed by transesterification, with the 3'-terminal G-OH of the RNA attacking of phosphorus atom located near the 5' end of the molecule. Both the 5' and 3' splice site phosphodiester bonds of group I intron precursor are unusually susceptible to slow hydrolysis, producing 5'-phosphate and 3'-hydrolysis termini. Site specific hydrolysis is thought to reflect the ability of the folded RNA structure to activate the splice-site phosphates or the incoming nucleophile, using the catalytic mechanism similar to that of self-splicing
Two-metal-ion mechanism seen in protein polymerases and
phosphatases was proposed to be used by group I and group II
introns to process the phosphoryl transfer reactions (Steitz
and Steitz, 1993), which was unambiguously proven by a
recently resolved high-resolution structure of the Azoarcus
group I intron (Stahley and Strobel, 2006).
Since early 1990s, the community started to study how group I intron
achieves its native structure in vitro, and some mechanisms of RNA
folding has been appreciated thus far. It is agreed that the
tertiary structure is folded after the formation of the secondary
structure (Brion and Westhof, 1997). During folding, RNA molecules
are rapidly populated into different folding intermediates, the
intermediates containing native interactions are further folded into
the native structure through a fast folding pathway, while those
containing non-native interactions are trapped metastable or stable
non-native conformations, and the process of conversion to the
native structure occurs very slowly (Thirumalai et al., 2001). It is
evident that group I introns differing in the set of peripheral
element display different potentials in entering the fast folding
pathway. Meanwhile, cooperative assembly of the tertiary structure
is important for folding of the native structure (Treiber and
Williamson, 2001; Xiao et al., 2003; Rangan et al., 2003; Chauhan et
al., 2005). Nevertheless, folding of group I introns in vitro
encounters both thermodynamic and kinetic challenges (Treiber and
Williamson, 1999; 2001). A few RNA binding proteins and chaperones
have been shown to promote the folding of group I introns in vitro
and in bacteria by stabilizing the native intermediates or
structure, and by destabilizing the non-native structures,
respectively (reviewed by Schroeder et al., 2004).
|Distribution, phylogeny and mobility
Group I introns are distributed in bacteria, lower eukaryotes and higher plants. However, their occurence in bacteria seems to be more sporadic than in lower eukaryotes, and they become prevalent in higher plants. The genes that group I introns interrupt differ significantly: They interrupte rRNA, mRNA and tRNA genes in bacterial genomes, as well as in mitochondrial and chloroplast genomes of lower ukaryotes, but only invade rRNA genes in the nuclear genome of lower eukaryotes. In higher plants, these introns seem to be restricted to a few tRNA and mRNA genes of the chloroplasts and mitochondria.
Both intron-early and intron-late theories have found evidences in
explaining the origin of group I introns (Haugen et al., 2005). Some
group I introns encode homing endonuclease (HEG), which catalyzes
intron mobility. It is proposed that HEGs move the intron from one
location to another, from one organism to another and thus account
for the wide spreading of the selfish group I introns. It is true
that no biological role has been identified for group I introns thus
far except for splicing of themselves from the precursor to prevent
the death of the host that they live by. A small number of group I
introns are also found to encode a class of proteins called
maturases that facilitate the intron splicing (Lambowitz et al.,
Adams P.L., Stahley M.R., Kosek A.B., Wang J., Strobel S.A. (2004) Crystal
structure of a self-splicing group I intron with both
exons. Nature, 430, 45-50.
Brion P., Westhof E. (1997) Hierarchy and dynamics of RNA folding. Annu.
Rev. Biophys. Biomol. Struct., 26, 113-37.
Burke, J.M., Belfort, M., Cech, T.R., Davies, R.W.,
Schweyen, R.J., Shub, D.A., Szostak, J.W., Tabak, H.F. (1987)
Structural conventions for group I introns. Nucleic Acids Res., 15,
Cannone, J.J., Subramanian, S., Schnare, M.N., Collett, J.R., D'Souza,
L.M., Du, Y., Feng, B., Lin, N., Madabusi, L.V., Muller, K.M., Pande
N., Shang Z., Yu N., Gutell R.R. (2002) The comparative RNA web (CRW) site: an online database of
comparative sequence and structure information for ribosomal,
intron, and other RNAs. BMC Bioinformatics, 3, 2.
Cech, T.R. (1990) Self-splicing of group I introns.
Annu. Rev. Biochem., 59, 543-568.
Chauhan S., Caliskan G., Briber R.M., Perez-Salas U., Rangan P., Thirumalai
D., Woodson S.A. (2005) RNA tertiary interactions mediate native
collapse of a bacterial group I ribozyme. J. Mol. Biol.,
Doherty E.A., Doudna J.A. (2001) Ribozyme structures and
mechanisms. Annu. Rev. Biophys. Biomol. Struct., 30, 457-475.
Eddy, S. R. (2002). A memory-efficient dynamic programming algorithm for optimal alignment of a sequence
to an RNA secondary structure. BMC Bioinformatics, 3:18.
Golden B.L., Gooding A.R., Podell E.R., Cech T.R. (1998) A preorganized
active site in the crystal structure of the Tetrahymena
ribozyme. Science., 282, 259-264.
Golden B.L., Kim H., Chase E. (2005) Crystal structure of a phage Twort
group I ribozyme-product complex. Nat. Struct. Mol. Biol.,
Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. (2003)
Rfam: an RNA family database. Nucleic Acids Res., 31(1), 439-441.
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A. (2005)
Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res,33, D121-124
Guo F., Gooding A.R., Cech T.R. (2004) Structure of the Tetrahymena
ribozyme: base triple sandwich and metal ion at the active
site. Mol. Cell., 16, 351-362.
Haugen P., Simon D.M. and Bhattacharya D. (2005) The natural
history of group I introns TRENDS in Genetics 21, 111-119.
Johansen S. and Haugen P. (2001) A new nomenclature of group I introns
in ribosomal DNA. RNA, 7, 935-936.
Lambowitz A.M., Caprara M.G., Zimmerly S., Perlman P.S. (1999) Group I and
II ribozymes as RNPs: Clues from the past and guides to the
future. In: Gesteland R, Cech TR, Atkins J, eds. The RNA World (2nd
ed.). Cold Spring Harbor Laboratory Press. pp. 451-485.
Lang B.F., Laforest M.J., Burger G. (2007) Mitochondrial introns: a critical view. Trends Genet., 23, 119-125.
Li, Z.J. and Zhang Y. (2005) Predicting the secondary structures and
tertiary interactions of 211 group I introns in IE subgroup. Nucleic
Acids Res., 33, 2118-2128.
Lisacek F., Diaz Y. and Michel F. (1994)
Automatic identification of group I intron cores in genomic DNA sequences, J. Mol. Biol.,
Michel, F. and Westhof, E. (1990) Modelling of the three-dimensional
architecture of group I catalytic introns based on comparative
sequence analysis. J. Mol. Biol., 216, 585-610.
Rangan, P., Masquida, B., Westhof E., Woodson S.A. (2003) Assembly of
core helices and rapid tertiary folding of a small bacterial group I
ribozyme. Proc Natl Acad Sci USA., 100, 1574-1579.
Schroeder R., Barta A., Semrad K. (2004) Strategies for RNA folding and
assembly. Nat. Rev. Mol. Cell Biol., 5, 908-919.
Stahley M.R., Strobel S.A. (2006). RNA splicing: group I intron crystal
structures reveal the basis of splice site selection and metal ion
catalysis. Curr Opin Struct Biol., 16, 319-326.
Stahley M.R., Strobel S.A. (2005) Structural evidence for a two-metal-ion
mechanism of group I intron splicing. Science.,
Steitz T.A., Steitz J.A. (1993) A general two-metal-ion mechanism for
catalytic RNA. Proc. Natl. Acad. Sci. USA., 90, 6498-6502.
Tanner M.A., Cech T.R. (1997) Joining the two domains of a group I
ribozyme to form the catalytic core. Science., 275, 847-849.
Tanner M.A., Anderson E.M., Gutell R.R., Cech T.R.(1997) Mutagenesis and
comparative sequence analysis of a base triple joining the two
domains of group I ribozymes. RNA., 3, 1037-1051.
Thirumalai D., Lee N., Woodson S.A., Klimov D. (2001) Early events in RNA
folding. Annu. Rev. Phys. Chem., 52, 751-762.
Treiber D.K., Williamson J.R. (2001) Beyond kinetic traps in RNA
folding. Curr. Opin. Struct. Biol., 11, 309-314.
Treiber D.K., Williamson J.R. (1999) Exposing the kinetic traps in RNA
folding. Curr. Opin. Struct. Biol., 9, 339-345.
Vicens Q., Cech T.R. (2006) Atomic level architecture of group I introns
revealed. Trends Biochem Sci., 31, 41-51.
Woodson S.A. (2005) Metal ions and RNA folding: a highly charged topic
with a dynamic future. Curr Opin Chem Biol., 9, 104-109.
Xiao M., Leibowitz M.J., Zhang Y. (2003) Concerted folding of a Candida
ribozyme into the catalytically active structure posterior to a
rapid RNA compaction. Nucleic Acids Res., 31, 3901-3908.
Xiao M, Li T, Yuan X, Shang Y, Wang F, Chen S, Zhang Y. (2005) A
peripheral element assembles the compact core structure essential
for group I intron self-splicing. Nucleic Acids Res., 33, 4602-4611.
|Labs and links: