ISB Home



- Article -





Volume 9


Full article

In Silico Biology 9, 0002 (2008); ©2008, Bioinformation Systems e.V.  



The SiteSeeker motif discovery tool

Klaus Ecker*, Jens Lichtenberg and Lonnie Welch1

Russ College of Engineering and Technology, Ohio University, Athens, Ohio, USA
1 Bioinformatics Laboratory, School of Electrical Engineering and Computer Science, Ohio University, Biomolecular Engineering Program, Ohio University, and Molecular and Cellular Biology Program, Ohio University, Athens, Ohio

* Corresponding author
   Email: ecker@ohio.edu


Edited by H. Michael; received April 21, 2008; revised November 03, 2008; accepted November 05, 2008; published December 30, 2008


Abstract

In this paper we describe some utilizing conditions of a recently published tool that offers two basic functions for the classical problem of discovering motifs in a set of promoter sequences. For the first it is assumed that not necessarily all of the sequences possess a common motif of given length l. In this case, CHECKPROMOTER allows an exact identification of maximal subsets of related promoters. The purpose of this program is to recognize putatively co-regulated genes. The second, CHECKMOTIF, solves the problem of checking if the given promoters have a common motif. It uses a fast approximation algorithm for which we were able to derive non-trivial low performance bounds (defined as the ratio of Hamming distance of the obtained solution to that of a theoretically best solution) for the computed outputs. Both programs use a novel weighted Hamming distance paradigm for evaluating the similarity of sets of l-mers, and we are able to compute performance bounds for the proposed motifs. A set of At promoters were used as a benchmark for a comparative test against five known tools. It could be verified that SiteSeeker significantly outperformed these tools.


Keywords: weighted Hamming distance, motif discovery, co-regulation, Arabidopsis thaliana