In Silico Biology 5, 0012 (2004); ©2004, Bioinformation Systems e.V.  
Dagstuhl Seminar "Integrative Bioinformatics"

Metabolites and pathway flexibility

Thomas Dandekar1,2,* and Steffen Schmidt3




1 Dept. of Bioinformatics, Biocenter, Am Hubland, D-97074 Würzburg, Germany,
2 European Molecular Biology Laboratory, Postfach 102209, D-69012 Heidelberg, Germany
3 Brigham and Women's Hospital, Division of Genetics, New Research Building,; 77 Ave Louis Pasteur; Boston, MA 02115, USA; Email: sschmidt@rics.bwh.harvard.edu



*  Corresponding author
   Phone: +49-931-888 4551; Fax: +49-931-888 4552;    Email: dandekar@biozentrum.uni-wuerzburg.de





Edited by E. Wingender; received September 30, 2004; revised and accepted October 27, 2004; published November 09, 2004



Abstract

Flexibility of metabolites and enzymes is investigated (i) on the level of the individual molecule, (ii) on the pathway level and (iii) combined effects on the systems and network level. Tools and results from our current research are summarized including data from our metabolite enzyme database. Including our latest census we find frequently used metabolites stimulate evolutionary flexibility in specific enzyme superfamilies. Furthermore, simultaneous changes of reactions and metabolites are observed in these flexible enzyme superfamilies. Both effects provide a strong source for resistance in parasites and pathogens. Specific adaptations scenarios and some counter strategies are discussed.

Key words: flexibility, metabolites, enzyme family, resistance, adaptation, pathway, network



Introduction

The flexibility of life allows it to survive under harshest circumstances. Beautiful adaptations allow for instance Archaebacteria to survive under hottest conditions or very halophilic environments. However, the dark-side of such adaptation potential is the unexpected flexibility of parasites and pathogens to adapt to any antibiotic condition or defence reactions of the host organism. This is one motivation for the present paper, another is the desire of bioinformatics to measure and model biological phenomena not only in extreme or medical circumstances but also for all organisms - enzyme and metabolite flexibility are excellent indicators of evolution and adaptation. In the following we give an overview on our current results and methods to analyze metabolite and pathway flexibility including latest data from our metabolite-enzyme database.

For our purpose we have to consider three different levels where these two ingredients of adaptation and evolution are acting: There is (i) the individual metabolite and enzyme flexibility in reaction chemistry and/or substrate specificity, (ii) resulting from this there are different possibilities for an individual pathway and (iii) the combined effects of both allows on the system level also for a wide and often underestimated network flexibility.



Analyzing enzyme and metabolite flexibility

Though we are in the dawn of metabolomics, in most circumstances it is easier to collect and measure enzyme flexibility. A direct investigation of metabolite flexibility is quite important in order to understand which other metabolites can be processed by an individual enzyme [Pollack et al., 2002], but this is inherently difficult. In contrast, sequence information either from genome efforts or direct proteomics data is easy to collect. Therefore, a first step is thus the alignment of related sequences to reveal the consensus sequence as well as the variation of specific protein families.

Based on this information, more detailed studies are possible. In particular, regarding enzyme flexibility it is important to look at the flexibility translated into protein structure. The structure of a domain is more conserved than its sequence. As soon as at least homology models of the enzyme structure are available, we can start to ask more specific questions to understand protein function of the flexible enzyme in question. First parameters to determine are what is specific and what is general in the enzyme structure. Particular regarding design of leads in antibiotic research or pharmacology the conserved parts of protein structure are critical.

Besides the flexibility of enzyme structure there is the flexibility of enzyme function: The same enzyme can be used in different pathway contexts. In particular, simple molecular activites such as ATPase activity (kinases, phosphatases etc.) or redox protection (e. g. glutathione reductase) can be used and occur in several pathways. An easy way to identify the potential of this level of functional enzyme flexibility is detailed domain analysis, e. g. using tools such as SMART [Letunic et al., 2004] and the conserved domain server [Marchler-Bauer and Bryant, 2004]. Additional domains give a hint to the regulatory functions acquired for the molecular basic function of this enzyme in the individual pathway under question. Furthermore, multidomain architecture allows next to participate in several pathways, partly regulatory networks, in higher eukaryotes, e. g. in adenylate kinase 3 (occurring e. g. in mammals).

In addition, one can use a third parameter to elucidate enzyme flexibility in a comparative way, which is called differential genome analysis. Here lists of orthologous enzymes are compared in different organisms, Venn diagrams separate shared enzymes common in two or more organisms from enzymes specific for an individual organism [Dandekar and Sauerborn, 2002]. In light of our present task the enzymes shared by most organisms indicate a core set of enzymes, which are essential for survival, they must be present in this organism group whereas the individual enzymes show the flexibility of enzyme content in a specific species. In fact differential genome analysis allows now to quantify this form of flexibility in exact numbers, furthermore the different groups (individual enzymes and organism group shared enzymes) lead both into two specific strategies for drug design.

Dotplots of orthologous genes are another interesting tool to analyse the flexibility on the genome-encoded level. This is powerful in closer related organism groups to differentiate between organisms and to identify specific enzyme sets present in all of them (e. g. specific kinases shared in all mollicutes are encoded in a conserved genomic island). However, with increasing genome distance all conserved regions decay rapidly.



Pathway flexibility

To measure the flexibility on the pathway level, different tools and databases are available. Well known are for instance the pathway charts from KEGG. However, to examine in detail flexibility of a specific pathways, comparative analysis and the establishment of a pathway alignment is necessary. Using the philosophy of sequence alignment, individual gaps and insertions of enzymes regarding the pathway in question are compared [Schmidt and Dandekar, 2002].

What do such analyses tell us about pathway flexibility? Interestingly, even for central pathways such as glycolysis the flexibility is surprisingly high. Plasticity even in central pathways is not only often present; it seems even to be actively selected by evolution (see below). In accordance with this, already including five or more organisms in such an analysis indicates possibilities for alternative routes, for example in vitamin B synthesis [Morett et al., 2003]. From a medical point of view, the alternative routes are promising targets for drug design.



Enzyme and metabolite flexibility means pathway evolution

Besides these comparative tools to detect enzyme flexibility, there are sequence and genome analysis options which allow to predict function and, to a certain extent, functional plasticity looking at an individual sequence and enzyme (and not an alignment), e. g. detection of domains using tools such as AnDom [Schmidt et al., 2002] and SMART [Letunic et al., 2004].

It should be noted that there is also a further level for flexibility in enzyme function regarding regulation of gene expression. To this end, iron-responsive elements in higher eukaryotes have been studied since a long time [Dandekar et al., 1991; Dandekar et al., 1998]. However, there seems to be a plethora of regulatory elements to regulate enzyme expression on the mRNA level [Dandekar, 2002; Dandekar and Sharma, 1998]. Furthermore, for many of these elements it is evident that though the enzyme function does not change between species, the individual regulation of the enzyme expression does. This type of flexibility has definitely been underestimated in prokaryotes as the upcoming topic and surprising rich variability of riboswitches indicates [Bengert and Dandekar, 2004; Mandal et al., 2003].



Network flexibility

Next we want to turn to metabolite flexibility in the network context. As mentioned above, this is more difficult to measure directly. Helpful is the use of large scale databanks of metabolites and enzyme families [Schmidt and Dandekar, 2002]. These allow us to answer key questions such as which enzymes are there in a specific genome, in a specific enzyme class, in a specific organism group? Given these enzyme families, how flexible are the metabolites implicated here?

Applying such a database, we can conclude that there are typical scenarios for pathway evolution [Schmidt et al., 2003]. In this short communication we present a compilation from our most recent census (Swiss-Prot version 44, SCOP version 1.65). Besides the specific enzyme families (Table 1) and metabolites (Table 2) compiled, several interesting conclusions on enzyme and metabolite flexibility seem to be general:

Table 1: Variability of enzyme families. The variability of enzymes can happen in two ways: a) either by changes in their reaction chemistry or b) changes in their specificity for metabolites. Interestingly many of the listed families occur in both tables, which shows that these families seem to be extremely flexible. Enzyme families are defined as proteins containing the same catalytic domain, which is itself defined as SCOP superfamily (see also Schmidt et al., 2003).
a) Enzymatic reactions per superfamily
SCOP No. of metabolites Description
c.2.1 96 NAD(P)-binding Rossmann-fold domains
c.80.1 39 PLP-dependent transferases
c.83.1 38 a/b-Hydrolases
c.3.1 33 FAD/NAD(P)-binding domain
b.59.1 30 Trypsin-like serine proteases
c.1.8 29 NAD(P)-linked oxidoreductase
c.46.1 24 P-Loop containing nucleotide triphosphate hydrolases
a.131.1 22 Cytochrome P450
c.79.1 21 S-Adenosyl-L-methionine-dependent methyltransferases
d.122.1 21 Metalloproteases ("zincins")

b) Number of metabolites per superfamily
SCOP No. of metabolites Description
c.2.1 62 NAD(P)-binding Rossmann-fold domains
c.83.1 25 a/b-Hydrolases
c.79.1 23 S-Adenosyl-L-methionine-dependent methyltransferases
a.131.1 21 Cytochrome P450
d.122.1 18 Metalloproteases ("zincins"),
c.3.1 17 FAD/NAD(P)-binding domain
d.136.1 14 Class II aaRS and biotin synthetases
c.108.1 12 Cobalt precorrin-4 methyltransferase CbiF
c.1.8 12 NAD(P)-linked oxidoreductase
d.182.1 11 Protein kinase-like (PK-like)


Table 2: The most abundant metabolites in enzymatic reactions. Based on the dataset already used in Table 1 in combination with the LIGAND database, the number of reactions a metabolite was involved in was calculated.
No. of enzymatic reaction Metabolite
203 H2O
90 ATP
65 NAD+
63 ADP
63 NADH
60 O2
54 CO2
54 NADP+
54 NADPH
51 Phosphate
51 Bisphosphate
41 Pyridoxal phosphate
40 Zinc
37 CoA
34 NH3
31 AMP
28 S-Adenosyl-L-methionine
26 Pyruvate
27 FAD
25 S-adenosyl-L-homocysteine
24 L-Glutamate
21 H2O2
20 NAD(P)H
20 2-Oxoglutarate


In a typical genome, some enzyme superfamilies are used frequently, e. g. enzymes with a TIM barrel fold. On the other end of the distribution there are rare folds used only once for one enzyme and never used again. A log-log plot using latest data confirms that this is a scale-free distribution (Fig. 1).



Figure 1: Log-Log plot of enzyme superfamilies in specific genomes.
This double logarithmic plot compares the observed number of superfamilies in yeast (red) or E. coli (black), which catalyze a certain number of enzyme reactions. Many superfamilies are observed only once (left top) while some enzyme superfamilies are very flexible and catalyze many reactions in the genome. The lines in the log-log plot indicate that this is a scale-free distribution. This dataset is based on the identification of SCOP superfamilies (version 1.65) in protein sequences of enzymes (Swiss-Prot version 44) for each individual proteome. The sequences were analyzed using AnDom [Schmidt et al., 2002]: only significant hits (E < 10–3) to SCOP domains were analyzed. Overlapping hits of the same superfamily were combined and counted as one domain. Dubious hits of overlapping superfamilies were ignored except when a superfamily was inserted in a larger domain.


On the level of the enzyme families (Table 1) two patterns can be observed. If classified according to their folds, about 20% of all enzyme families are quite variable, while 80% are comparatively conservative. These variable families provide many and quite different enzyme reactions.

How is this reflected on the metabolite level? Interestingly, variable enzyme superfamilies change not only their reaction type, which often can be as far as changing their EC main class reaction type but they often change their metabolites, too. That means that there is a additional metabolite flexibility in these families. A possible explanation for such variability in these enzymes families is their broader sequence divergence compared to less flexible families.

Metabolites connect reactions and enzymes, however, are there for example particularly well-suited "connecting metabolites"? The enzyme-metabolite databank shows again (Table 2) that this is true: Key metabolites are adapted and used in many different enzyme superfamilies and folds. They help to establish new pathways as they are kept while new metabolites are connected to the network. These key metabolites occur even more often then expected from an exponential distribution. They involve partly so called "energy currency" metabolites and nucleotide cofactors, however in general small hydrophilic metabolites are selected for this specific class of highly used "connecting" or "hub" metabolites.

Regarding metabolic networks one obtains the result that wide-spread recruitment is the most often observed pattern of pathway evolution for metabolic pathways comparing many bacteria and eukaryotes for example according to the census in Swiss-Prot. The flexibility of pathways by extensive recruitment was investigated in detail in E. coli [Rison et al., 2002; Teichmann et al., 2001]. However, does this evolutionary principle hold also true in specific organism groups where the pathway flexibility is a danger for us, i. e. in parasites? Here an overview shows [Zientz et al., 2004]: De novo invention in endoymbionts is rare, and in parasites this looks similar (at least according to our own data). De novo invention happened only in early phases of their evolution before they became parasites. Instead of a retro-evolution from the last step of a specific pathway there happens an inverse process in parasites and endosymbionts involving partial or complete deletion of early pathway steps. Similarly, specialisation of enzymes is modified in the opposite direction to preserve or gain of multi-functionality



Fighting network and enzyme flexibility

A study example is the redox metabolism around glutathione reductase. For dissection of such a complex network two new algorithms are available. Pathway duplication happens in endoymbionts and parasites in specific pathways, notably in host interaction factors and pathogenic features. Enzyme recruitment happens as in other organisms in many household pathways and, importantly, for antibiotic defence in parasites. New potential drug targets are revealed by identification of specific differences between parasite and host metabolism. Examples include plasmoredoxin and the large Glyoxalase I.



Drug strategies to fight parasite network flexibility

The large adaptation potential by the above mechanisms is a challenge. To fight this we combine pharmacogenomics, biochemistry and bioinformatics [Ziebuhr et al., 2004]. In the above example, this leads to the identification of new targets and pathways, for example the new combination "BlueCQ" a combination therapy of methylen blue and chloroquin which at the same time attacks redox pathways around glutathione reductase and hemoglobin ingestion in the food vacuole of the parasite. Genetic exchange and specific adaptations allow fast development in malaria plasmodia and their vector, the Anopheles mosquito. In general resistance arises by an antibiotic uptake block, or by inactivation, secretion and detoxification of drugs.

Combination therapies, combined attacks on several pathways and bi-headed drugs are new strategies to lower risk of resistance development to cope with the flexibility of parasite pathways.



Discussion

This short overview of our current efforts to understand the different levels of flexibility in enzyme and metabolites includes latest data from our metabolite-enzyme database as well as appropriate references for further methods developed from us to examine enzyme and metabolite flexibility in this respect. However, understanding mechanisms of pathway and network evolution is a fascinating research area involving strong efforts from a number of other laboratories as well, for example ongoing and previous studies by Iyer et al., 2004, Luscombe et al., 2004, Teichmann et al., 2001, Templeton et al., 2004, as well as work from the Barabasi group [Almaas et al., 2004; Jeong et al., 2000] to name just a few well known studies. The important finding of the later group of scale free networks in enzyme-metabolite networks [Jeong et al., 2000] has been challenged by some studies [Arita, 2004; Ma and Zeng, 2003]. Nevertheless we (e. g. this paper and Schmidt et al., 2003) and others (e. g. Hegyi et al., 2002; Rison et al., 2002) have in the meantime collected well founded data on the scale free distribution of enzyme folds in individual genomes. The examination of the specific folds (Table 1) and specific metabolites (Table 2) promoting pathway and network flexibility is an ongoing study in our laboratory and the updated list of enzymes and metabolites presented here should allow further studies to further confirm or extend and modify these observations. Certainly further factors are involved in pathway flexibility including different evolutionary mechanisms (see above). Furthermore, in pathogens genetic mechanisms such as plasmids and other ways of horizontal transfer (and other mechanisms) increase the challenge to fight network flexibility and in particular resistance mentioned above (e. g. Striepen et al., 2004). Luckily, the promising strategies including biheaded drugs [Davioud-Charvet et al., 2001] and combined pathway attack [Schirmer et al., 2003] recommended here can be extended by further antibiotic strategies detailed elsewhere [Ziebuhr et al., 2004].



Conclusion

Enzyme flexibility allows specific protein structures, the roughly 20% flexible SCOP enzyme families, to adapt to new reactions and metabolites. Hub metabolites provide a helping hand. These and variable enzyme superfamilies lead to a wide spread recruitment to new pathways. This allows high flexibility even in central pathways. Differential genome analysis allows both to detect adaptation potential in new genomes as well as promising drug targets in terms of organism specific enzymes. Combined intervention strategies are becoming the method of choice to fight resistance development in parasites and infectious agents.




References