The prediction of gene models in a eukaryotic genome can be improved by using local similarity information between the genomic sequence and homologous proteins or ESTs in combination with the identification of potential splice sites.
We developed a program, SPLICE, that is able to model gene using similarity data. SPLICE uses the results of BLAST searches against protein or nucleic databases for the verification of pattern based putative splice sites. SPLICE has been designed to tolerate (and point out) probable pseudogenes or sequencing errors that would cause frame shifts of potential exons. Moreover, SPLICE offers the user the possibility to intervene interactively in gene prediction, replacing computer predicted splice sites by manually selected ones.
The quality of SPLICE predicted gene models can be furthermore increased through integration with the output from other gene prediction programs. This function has been implemented by a further program developed by us, MODEL. SPLICE and MODEL were applied by the MIPS group during the annotation of the sequences produced by the European Arabidopsis thaliana sequencing project.