ISB Home



- Article -





Volume 4


Full article

In Silico Biology 4, 0017 (2004); ©2004, Bioinformation Systems e.V.  



Efficient prediction of alternative splice forms using protein domain homology

Michael Hiller1*, Rolf Backofen1*, Stephan Heymann2*, Anke Busch1, Timo Mika Gläßer2 and Johann-Christoph Freytag2

1 Friedrich-Schiller-Universität Jena, Institute of Computer Science, Chair for Bioinformatics, Ernst-Abbe-Platz 1-4, D-07743 Jena, Germany
  Email: hiller@inf.uni-jena.de, backofen@inf.uni-jena.de, busch@inf.uni-jena.de

2 Humboldt-Universität zu Berlin, Institute of Computer Science, Unter den Linden 6, D-10099 Berlin, Germany
  Email: heymann@dbis.informatik.hu-berlin.de, glaesser@dbis.informatik.hu-berlin.de, freytag@dbis.informatik.hu-berlin.de

*These authors contribute equally to this work.


Edited by E. Wingender; received November 06, 2003; revised and accepted January 29, 2004; published March 24, 2004


Abstract

Alternative splicing can yield manifold different mature mRNAs from one precursor. New findings indicate that alternative splicing occurs much more often than previously assumed. A major goal of functional genomics lies in elucidating and characterizing the entire spectrum of alternative splice forms.

Existing approaches such as EST-alignments focus only on the mRNA sequence to detect alternative splice forms. They do not consider function and characteristics of the resulting proteins. One important example of such functional characterization is homology to a known protein domain family. A powerful description of protein domains are profile Hidden Markov models (HMM) as stored in the Pfam database.

In this paper we address the problem of identifying the splice form with the highest similarity to a protein domain family. Therefore, we take into consideration all possible splice forms. As demonstrated here for a number of genes, this homology based approach can be used successfully for predicting partial gene structures. Furthermore, we present some novel splice form predictions with high-scoring protein domain homology and point out that the detection of splice form specific protein domains helps to answer questions concerning hereditary diseases.

Simple approaches based on a BLASTP search cannot be applied here, since the number of possible splice forms increases exponentially with the number of exons. To this end, we have developed an efficient polynomial-time algorithm, called ASFPred (Alternative Splice Form Prediction). This algorithm needs only a set of exons as input.

Key words: alternative splicing, novel splice forms, Pfam, protein domain, Viterbi algorithm, profile HMM, gene prediction