For analyzing large metabolic networks such as metabolisms of species with completely sequenced genomes it is necessary to have special tools to handle the problems caused by the high complexity of those networks [Bork et al., 1999]. For programs analyzing the structural properties of networks (e.g. METATOOL, Pfeiffer et al., 1999), the running time and memory usage increases rapidly. For example, the number of elementary modes (simple biochemical routes of a reaction network; cf. Schuster et al., 1999) can grow exponentially with increasing number of reactions.
The structural analysis of complex reaction networks is essential for understanding features of
functionality including metabolic and genetic regulation. Moreover, it is instrumental in the
reconstruction of bacterial metabolisms. Since the number of species with completely
sequenced genomes increases rapidly, that kind of analysis has important future prospects
[Schilling et al., 1999].
To construct reaction networks, a number of internet sources can be used. According to our
experience, the best source for biochemical data is the KEGG server
(http://www.genome.ad.jp/kegg/metabolism.html). For several completely sequenced
genomes, files containing the EC numbers are available. Furthermore there are files
containing information of all enzymes (LIGAND database) and metabolites (COMPOUND
database), but for parsing, a better quality and consistence of the data would be desirable.
Very often there are synonyms for one metabolite (e.g. pyrophosphate and diphosphate) and
the enzyme entries are almost more unspecific than the enzymes itself (e.g. for ADH: Alcohol
+ NAD+ = Aldehyde or Ketone + NADH).
To avoid the problems caused by the complexity of large networks, we decided to choose the following strategy: First divide the networks into subsystems, than analyze the subsystems with conventional methods, and finally study the interactions of the subsystems and try to put them together to the original system.
KEGG suggests to put the enzymes into functional groups. For our work, that is not the best way since such functional groups can vary from species to species (The reactions used in the TCA cycle can also be found in photosynthetic bacteria -- for carbon fixation) and there are a lot of overlaps between different functional groups: Both metabolites and enzymes can occur in more than one system and that makes it almost impossible to put together the complete network. To find a good strategy for grouping the enzymes, it must be considered, how networks are analyzed in general.
A network is defined by metabolites and reactions linking them. A reaction can be reversible or irreversible. That information is important to find out for example whether or not a pathway is reversible. Interestingly, most pathways are irreversible, and synthesis and degradation can differ significantly (e.g. synthesis and degradation of amino acids). A metabolite can be internal or external. An internal metabolite should not accumulate or decrease in time (and that is the case for most metabolites). In steady state, the following equation holds:
Nv = dc/dt = 0
(N -- stoichiometric matrix, v -- velocity vector, c -- vector of concentrations of internal
metabolites).
That is not the case for external metabolites. A metabolite is external, if it is well buffered (as it can be assumed for water or sources from the environment), or a storage metabolite (e.g. glycogen). Very often, several coenzymes (ATP, NADH) are supposed to be external for smaller systems, since those substances are buffered by processes in other subsystems of a cell.
As a result of a topological analysis of a subsystem, we obtain the nullspace of the stoichiometric matrix (a set of linearly independent flux distributiones), conservation relations, the convex basis spanning the region of admissible fluxes, the elementary modes (simple metabolic routes) and enzyme subsets (groups of enzymes always acting together).
If we put two subsystems together, we just have to combine the shared external metabolites: If one subsystem involves two different pathways A > B, and another subsystem involves 3 possibilities for B > C, the resulting system would have 2x3 possible pathways for producing C from A. A simple combination rule of pathway is only possible, if both subsystems do not share a reaction or internal metabolite (the intersection of internal metabolites and reactions must be empty). If a metabolite occurs in two different subsystems, it is external by definition.
For generating the subsystems, the following conditions have to be fulfilled: No reaction and no internal metabolite should occur in more than one subsystem. The subsystems are linked only by external metabolites. In that way, combining subsystems is equivalent to making the shared external metabolites internal. (Metabolites that are external for the complete system should be kept external, of course.) On the other hand, if an internal metabolite is made external, it may happen that the system is divided in two (or more) disconnected subnetworks. That is exactly the way to divide a system into subsystems: Choose a number of internal metabolites and make them external. With a good choice, the system is divided into a number of subsystems, which can be easily analyzed with METATOOL. A good way to choose metabolites is to choose those metabolites with a high number of links (for example coenzymes).
The algorithm for detecting all disconnected subnetworks is known from graph theory. The easiest way to do that is to block--diagonalize the stoichiometric matrix. Each block forms a disconnected subnetwork. In fact, there are also other algorithms, with a much better performance, using linked lists. For those algorithms, it is not necessary to know the complete stoichometry, only the linkage matrix must be known (which metabolite is involved into which reaction). The links are easily obtainable from the available data sets: in COMPOUND, for each metabolite there is a list of reactions using that metabolite. The stoichiometry can be taken to account later. The subnetworks can be saved in a file format used by METATOOL. If the properties of all subnetworks are known, one can analyze the properties of the complete system.
The algorithms and ideas will be demonstrated on the metabolism of Mycoplasma pneumoniae, which is a parasite of the human respiratory tract.