ISB Home



- Article -





Volume 7


Full article

In Silico Biology 7, 0033 (2007); ©2007, Bioinformation Systems e.V.  



Clustering formal concepts to discover biologically relevant knowledge from gene expression data

Sylvain Blachon1,2, Ruggero G. Pensa2, Jérémy Besson2, Céline Robardet2, Jean-François Boulicaut2 and Olivier Gandrillon1*

1 Equipe "Bases Moléculaires de l'Autorenouvellement et de ses Altérations", Université de Lyon, Lyon, F-69003, France; Université Lyon 1, Villeurbanne,    F-69622, France; CNRS UMR5534, CGMC, Villeurbanne, F-69622, France

2 INSA-Lyon, LIRIS CNRS UMR5205, Bātiment Blaise Pascal, F-69621 Villeurbanne cedex, France


* Corresponding author
   Email: Gandrillon@cgmc.univ-lyon1.fr


Edited by E. Wingender; received October 27, 2006; revised February 01 and May 29, 2007; accepted June 16, 2007; published July 16, 2007


Abstract

The production of high-throughput gene expression data has generated a crucial need for bioinformatics tools to generate biologically interesting hypotheses. Whereas many tools are available for extracting global patterns, less attention has been focused on local pattern discovery. We propose here an original way to discover knowledge from gene expression data by means of the so-called formal concepts which hold in derived Boolean gene expression datasets. We first encoded the over-expression properties of genes in human cells using human SAGE data. It has given rise to a Boolean matrix from which we extracted the complete collection of formal concepts, i.e., all the largest sets of over-expressed genes associated to a largest set of biological situations in which their over-expression is observed. Complete collections of such patterns tend to be huge. Since their interpretation is a time-consuming task, we propose a new method to rapidly visualize clusters of formal concepts. This designates a reasonable number of Quasi-Synexpression-Groups (QSGs) for further analysis. The interest of our approach is illustrated using human SAGE data and interpreting one of the extracted QSGs. The assessment of its biological relevancy leads to the formulation of both previously proposed and new biological hypotheses.


Keywords: transcriptome, SAGE, pattern discovery, formal concepts, closed sets, clustering