DNA microarray hybridization [DeRisi et. al, 1997] and similar techniques [Weiler et. al, 1997] are used to measure
genomewide expression patterns. For every gene of an organism the expression rates
are determined under well defined experimental conditions at several successive time points. The result is a time course of ex
pression rates for each gene.
A first step towards the rapid and comprehensive interpretation of the data is the
clustering of the genes with respect to the
expression patterns [Eisen et. al, 1998]. The individual
genes are sorted into groups (clustered) by
a clustering algorithm. Here, we present a
method that maps clusters of genes onto dynamically constructed metabolic pathways.
The clustering is achieved by using Kohonen's SelfOrganizing Map (SOM) that is well suited for the analysis of multi dimensional data. The method, first used for the clustering of gene expression patterns by Tamayo et al., has distinct advantages [Tamayo et. al, 1999] over the hierarchical clustering employed by Eisen et al. [Eisen et. al, 1998]. The SOM is a neural network that provides a mapping from the multi dimensional data space into a discrete two dimensional space. The method is robust, scalable, flexible, and reasonably fast. Additionally, the clusters are sorted according to the two dimensional regular discrete topology of the map. Thus, neighbouring clusters are quite similar, while more distant clusters become increasingly diverse [Kohonen, 1995].
![]() |
Figure 1: Two of 6 \Theta 9 clusters resulting from the diauxic shift data set. Abscissa: seven time points, ordinate: logarithm of the ratio of the expression rates. Each line in the diagrams represents a single gene. Right: mean and standard deviation of the expression rates of the clusters. |
The resulting gene clusters are used for the dynamic construction of metabolic pathways. If a gene is known to code for an enzyme, it is mapped onto the reaction that is catalyzed by the enzyme. We employ an algorithm that constructs the set of all qualitatively feasible metabolic pathways from a set of biochemical reactions and a set of constraints [Mavrovouniotis, 1993]. The set of reactions and the constraints are determined heuristically from the gene clustering (Fig. 2). The generated hypothetical pathways allow a com prehensive analysis of genomic data in the context of metabolism.
![]() |
Figure 2: Schematic representation of the method presented. It consists of the clustering of gene expression profiles, the geneenzyme mapping, the generation of constraints and the assembly of metabolic pathways. |
We applied the method to the data set published by [DeRisi et. al, 1997] that contains the expression rates of the yeast genes measured at seven successive time points during the diauxic shift. The data has been analyzed extensively by the authors and thus allows to compare the SOM clustering with the results described in the original paper. Figure 3 shows a pathway that was constructed by the algorithm. We selected five of 54 clusters that contained a total of 440 genes whose expression rate increased during the experiment. The algorithm constructed 1694 pathways from the reactions corresponding to the genes.
![]() |
Figure 3: One of 1694 pathways that have been generated from the reactions corresponding to the genes of five clusters. The selected clusters contain genes that are upregulated during the diauxic shift. This example shows a part of the TCA cycle, a result that is coherent with the manual interpretation of the data set by De Risi et al. [DeRisi et. al, 1997] |