ISB Home



- Article -





Volume 8


Full article

In Silico Biology 8, 0043 (2008); ©2008, Bioinformation Systems e.V.  



Extracting signature motifs from promoter sets of differentially expressed genes

Ieva Mitašiūnaitė1, Christophe Rigotti1, Stéphane Schicklin2, Laurène Meyniel1, Jean-François Boulicaut1 and Olivier Gandrillon2*

Université de Lyon, CNRS
1 INSA-Lyon, LIRIS, UMR5205, F-69621, Villeurbanne, France
2 Université Lyon 1, Centre de Génétique Moléculaire et Cellulaire, UMR5534, F-69622, Villeurbanne, France

* Corresponding author
   Email: gandrillon@cgmc.univ-lyon1.fr


Edited by E. Wingender; received July 22, 2008; revised October 23, 2008; accepted October 25, 2008; published December 13, 2008


Abstract

There is a critical need for new and efficient computational methods aimed at discovering putative transcription factor binding sites (TFBSs) in promoter sequences. Among the existing methods, two families can be distinguished: statistical or stochastic approaches, and combinatorial approaches. Here we focus on a complete approach incorporating a combinatorial exhaustive motif extraction, together with a statistical Twilight Zone Indicator (TZI), in two datasets: a positive set and a negative one, which represents the result of a classical differential expression experiment. Our approach relies on the existence of prior biological information in the form of two sets of promoters of differentially expressed genes. We describe the complete procedure used for extracting either exact or degenerated motifs, ranking these motifs, and finding their known related TFBSs. We exemplify this approach using two different sets of promoters. The first set consists in promoters of genes either repressed or not by the transforming form of the v-erbA oncogene. The second set consists in genes the expression of which varies between self-renewing and differentiating progenitors. The biological meaning of the found TFBSs is discussed and, for one TF, its biological involvement is demonstrated. This study therefore illustrates the power of using relevant biological information, in the form of a set of differentially expressed genes that is a classical outcome in most of transcriptomics studies. This allows to severely reduce the search space and to design an adapted statistical indicator. Taken together, this allows the biologist to concentrate on a small number of putatively interesting TFs.


Keywords: promoter, differential expression, complete pattern extraction, transcription factor, transcription factor binding site, twilight zone, extraction parameter tuning, exact matching pattern, soft matching pattern