| In Silico Biology 5, 0014 (2004); ©2004, Bioinformation Systems e.V. |
| Dagstuhl Seminar "Integrative Bioinformatics" |
1Seestr. 64, 13347 Berlin, Germany
Phone: ++49 (0)30 4504 3972, Fax: ++49 (0)30 4504 3959
Email: ina.koch@tfh-berlin.de, markus.schueler@tfh-berlin.de
2Brandenburg University of Technology at Cottbus, Dept. Computer Science,
03013 Cottbus, Germany
Phone: ++49 (0)355 69 3884, Fax: ++49 (0)355 69 3830
Email: monika.heiner@informatik.tu-cottbus.de
* Corresponding author
Edited by E. Wingender; received October 04, 2004; revised and accepted December 03, 2004; published January 22, 2005
To understand biochemical processes caused by, e. g., mutations or deletions in the genome, the knowledge of possible alternative paths between two arbitrary chemical compounds is of increasing interest for biotechnology, pharmacology, medicine, and drug design. With the steadily increasing amount of data from high-throughput experiments new biochemical networks can be constructed and existing ones can be extended, which results in many large metabolic, signal transduction, and gene regulatory networks. The search for alternative paths within these complex and large networks can provide a huge amount of solutions, which can not be handled manually. Moreover, not all of the alternative paths are generally of interest. Therefore, we have developed and implemented a method, which allows us to define constraints to reduce the set of all structurally possible paths to the truly interesting path set. The paper describes the search algorithm and the constraints definition language. We give examples for path searches using this dedicated special language for a Petri net model of the sucrose-to-starch breakdown in the potato tuber.
Availability: http://sanaga.tfh-berlin.de/~stepp/
Keywords: Petri nets, systems biology, biochemical networks, metabolic networks, signal transduction networks, path search with constraints, graph theory, sucrose-to-starch breakdown, potato tuber
Petri net theory provides a definite description formalism and various analysis techniques for systems with concurrent processes. Since more than ten years Petri nets are also applied to model and analyse biochemical systems qualitatively [Reddy et al., 1993; Reddy et al., 1996; Voss et al., 2003; Heiner et al., 2004] as well as quantitatively [Hofestädt, 1994; Hofestädt and Thelen, 1998; Matsuno et al., 2003]. The results show that Petri net theory forms a useful theoretical basis for qualitative model validation and analysis [Heiner and Koch, 2004].
The search for alternative paths and their analysis are a crucial point in the understanding of biochemical networks. Changes in the system behaviour, caused by mutations in the genomic sequence or by absence of special enzymes, are more likely to be comprehensible with the knowledge and deep understanding of alternative paths. The next step would be to answer the question, under which conditions which paths could be active.
Methods for path searches in graphs are well-established techniques in computer science. There exist several algorithms to search for paths in undirected and directed arbitrary graphs, such as breadth-first search (BFS) and depth-first search (DFS), and other specialised variants as the Bellman-Ford algorithm, the Dijkstra algorithm, the Floyd-Warshall algorithm, and others; for an overview see Cormen et al., 2001. The direct application of these methods to biochemical networks could result in an enormous amount of possible paths. Typically, the complete amount of paths is not necessary. Even worse, its manual management is hardly possible. To reduce the solution set to the in general much smaller subset of interest, we suggest the definition of special constraints, which have to be fulfilled by all paths in the solution set.
Existing Petri net tools, e. g., INA [Starke and Roch, 1999], search for paths in the reachability graph, i. e. for paths from one state to another state of the system, to answer the question whether a special system state is reachable from another given system state. In KEGG [Kanehisa and Goto, 2000] and other network databases it can be searched for special pathways, but not for all paths between two arbitrary chemical compounds, satisfying additional constraints.
For these reasons, we have implemented a stand-alone basic tool for path search between two arbitrary vertices in the net using constraints in order to get those paths we are interested in. In this paper we explain the search algorithm and the language definition to formulate constraints. In the result section we show how to use the language for constraint definition and give several examples for specific path searches for the sucrose-to starch breakdown in the potato tuber.
A classical place/transition Petri net (P/T net) forms the basis for our considerations. The places represent the metabolites or chemical compounds, while transitions stand for enzymes catalysing the underlying chemical reactions. For further information on Petri nets and their application to biochemical systems see Murata, 1982, or Heiner and Koch, 2004, respectively.
For the path search we specify two arbitrary vertices, which can be transitions or places, as start vertex and as end vertex, for which all possible paths in between have to be computed. A path of length k is defined as a sequence of k vertices in the graph, which are connected by edges. The vertices are enumerated along the path, beginning with the start vertex as number 1. The position of a vertex in a path corresponds to the vertex number. No vertex occurs twice in a path, which is necessary to avoid loops. Forward and backward reactions are considered to be the same vertex to circumvent loops between them.
The method consists of two main parts: firstly, the exhaustive path search by traversing the graph representing our Petri net, secondly, the reduction of the paths, resulting from the exhaustive search by evaluation of constraints.
Exhaustive path search
A first step is the creation of a special graph structure, which serves as basis for a fast path search. Petri nets are converted using BFS as a fast algorithm to visit all vertices, which takes O(V + E) time, whereby V is the number of vertices and E the number of edges.
The second step is the path search. For a correct and fast implementation we perform an exhaustive search by traversing our graph structure. Starting with the start vertex we visit recursively all vertices until the end vertex is reached, see Figure 1. In this way a vertex can be visited more than ones. This ensures us to find all paths, but leads to an exponential running time in the worst-case. Despite the fact that the running time for our case study is still acceptable (much less than a second), an improved implementation is under development.
Because of the density and the large amount of reversible reactions in biochemical networks, the exhaustive search, as introduced above, results generally in a huge amount of possible loop-free paths; see the result section for examples. Using dedicated constraints the search becomes more purposeful in order to get only the paths of interest.
|
Figure 1: Path search from vertex A to vertex I in a graph. When reaching vertex I, the corresponding path will be read out. |
The constraint concept
The definition of a constraint language is the main contribution of the approach presented in this paper. The use of constraints gives us the possibility to characterise biological requirements in metabolic paths search. Constraints could be used to model situations such as loss-of-function mutations of enzymes or absence of substrates, and to perform specific path searches fulfilling these requirements. Furthermore, a concentration on special paths of interest, e. g., paths through specific vertices, paths with a given length, etc., should be supported.
A basic objective of our constraint definition language is high user acceptance by an intuitive, but powerful user interface. To reach these aims, we reduced the language elements to a small number of basic constraints (such as "should include a specific vertex", "should be smaller than ten vertices", etc). Starting from these simple atomic constraints, more powerful composite constraints of any complexity can be constructed using logical operators as AND, OR, and NOT. In the following, we explain our constraint definition language. Constraints are defined as Strings ("") using brackets and logical operators.
Brackets:
{ } stand for the specification of constraints, e. g., i { ATP },
and ( ), [ ] for logically combined constraints, e. g., [ (i { ATP }) | (i { ADP }) ].
These two types of brackets are introduced to improve the readability.
Logical operators:
& stands for the AND operator (conjunction),
| stands for the OR operator (disjunction), and
~ stands for the NOT operator (negation).
Constraint definition
Possible constraints are classified as vertex-dependent (include, exclude, position), path-dependent (length, score), and result-dependent constraints (shortest, longest, lowest, highest). The names of vertices are not case sensitive.
Vertex-dependent constraints:
Include (i) ensures the occurrence of a given vertex in the path. This means in biological sense that the path has to include a special enzyme or metabolite.
Exclude (e) ensures the absence of a given vertex in the path. Special enzymes or metabolites are excluded explicitly, e. g., for knockout simulations.
Position (p) ensures the occurrence of a given vertex at a specified position in the path. This constraint is a much stronger than the include constraint.
Path-dependent constraints:
These constraints refer to the whole path length. Length (k) compares the length of the computed path with the previously defined length k using the relations greater than, smaller than, and equals to. Score (s) compares the score s, defined as the sum of the edge weights along the path, with the previously defined score s using the relations greater than, smaller than, and equals to.
These vertex- and path-dependent constraints can be combined by the given logical operators (AND, OR, and NOT). Because of the possibility of combining constraints by these operators and of using brackets, each other logical standard operator, e. g., NAND, NOR, XOR, and others can be described.
Result-dependent constraints:
These constraints are not evaluated on one path, but on the set of all computed paths. This type of constraints can not be connected by logical operators. Each defined constraint string can contain only one result-dependent constraint. Using these constraints optimal paths can be calculated. If there are several optimal paths, all optimal paths will be provided.
Shortest or longest gives the paths of shortest or longest length, respectively.
Lowest or highest gives the paths with lowest or highest score, respectively.
User-defined constraints:
The user has the possibility to define own constraints and save them in a separate file. They can also be defined as standard constraints, but must be named. In the current implementation there exist the following pre-defined constraints.
Implementation
The software is implemented in Java version 1.4.2. Thus, the program runs under Windows and Linux/Unix.
To demonstrate the functionality of the program we consider the Petri model of the main carbon metabolism in potato tubers. It describes the sucrose-to-starch breakdown in the potato tuber. The Petri net model is not very large, but, because of the many reversible reactions and the graph density, complicated enough to show the usefulness of constraint definitions to search for alternative paths. For a more detailed description and explanation of the modelling and analysis approach see Koch et al., 2004. The net is depicted in Figure 2.
Abbreviations for enzymes, metabolites, and reaction types are compiled in the following:
| Table 1: | Abbreviations used for enzymes, metabolites, summarised reactions, and environment reactions. |
|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In the following we give examples for path searches. For this purpose, we have to give the start and end vertex, followed by the constraint string.
Search without constraints:
eSuc-> starch "": The search for all possible paths from eSuc to starch without constraints yields 171 paths. This large number of paths arises because of the dense net structure containing many reversible reactions.
Search with vertex-dependent constraints:
eSuc-> starch "i{ UTP }": The search for all paths from eSuc to starch including UTP yields 32 possible paths.
eSuc-> starch "e{ ATP } & i{ UTP }": The search for all paths from eSuc to starch excluding ATP and including UTP yields one possible path.
eSuc-> starch "i{ Glc } & e{ HK }": There is no path from eSuc to starch including Glc and excluding HK. This is obvious, because Glc can only be converted into G6P through HK.
eSuc -> starch "l{ <, 11 }" searches for all paths with at most eleven vertices and finds only sixteen possible paths, see Table 2.
| Table 2: | The results of searching for all paths from eSuc to starch with at most eleven vertices. |
| 1 | SucTrans, SuSy, UGPase, NDPkin_rev, StaSy(b) |
| 2 | SucTrans, SuSy, UGPase, PGM_rev, StaSy(b) |
| 3 | SucTrans, SuSy, FK, Glyc(b), StaSy(b) |
| 4 | SucTrans, SuSy, FK, NDPkin_rev, StaSy(b) |
| 5 | SucTrans, SuSy, FK, Adk_rev, StaSy(b) |
| 6 | SucTrans, SuSy, FK, Glyc(b), StaSy(b) |
| 7 | SucTrans, SuSy, FK, PGI_rev, StaSy(b) |
| 8 | SucTrans, Inv, FK, Glyc(b), StaSy(b) |
| 9 | SucTrans, Inv, FK, NDPkin_rev, StaSy(b) |
| 10 | SucTrans, Inv, FK, Adk_rev, StaSy(b) |
| 11 | SucTrans, Inv, FK, Glyc(b), StaSy(b) |
| 12 | SucTrans, Inv, FK, PGI_rev, StaSy(b) |
| 13 | SucTrans, Inv, HK, Glyc(b), StaSy(b) |
| 14 | SucTrans, Inv, HK, NDPkin_rev, StaSy(b) |
| 15 | SucTrans, Inv, HK, Adk_rev, StaSy(b) |
| 16 | SucTrans, Inv, HK, StaSy(b) |
| In the results the transitions are given. |
The search for the shortest paths: eSuc -> starch "shortest" yields one result: SucTrans, Inv, HK, StaSy(b).
Biologically motivated path search
Mutation of sucrose synthase
The question is to get all alternative paths, when sucrose synthase is not available, because, e. g., of a loss-of-function mutation. Start and end vertices are sucrose and starch. As mentioned earlier, 171 paths exist. Now, we have to exclude sucrose synthase by the constraint "e{ SuSy }". The result consists of 51 paths, which are far too much to be evaluated manually. One possibility for path reduction is the exclusion of all PP/Pi bridges, formulated as constraint by "e{ SuSy } & no_ppi", which results into 39 paths. To restrict the path length, because many very long paths occur, we add the constraint "e{ SuSy } & no_ppi & lmax{ 11 }", whereby the length 11 is chosen heuristically. Using length nine, we get only one path, using length thirteen, we get fourteen paths, while for the length eleven we get the following nine paths:
| 1: | Suc, Inv, Frc, FK, ADP, Glyc(b), ATP, StaSy, starch, |
| 2: | Suc, Inv, Frc, FK, ADP, NDPkin_rev, ATP, StaSy, starch, |
| 3: | Suc, Inv, Frc, FK, ADP, AdK_rev, ATP, StaSy, starch, |
| 4: | Suc, Inv, Frc, FK, F6P, Glyc(b), ATP, StaSy, starch, |
| 5: | Suc, Inv, Frc, FK, F6P, PGI_rev, G6P, StaSy, starch, |
| 6: | Suc, Inv, Frc, Glc, HK, ADP, Glyc(b), ATP, StaSy, starch, |
| 7: | Suc, Inv, Frc, Glc, HK, ADP, NDPkin_rev, ATP, StaSy, starch, |
| 8: | Suc, Inv, Frc, Glc, HK, ADP, AdK_rev, ATP, StaSy, starch, |
| 9: | Suc, Inv, Frc, Glc, HK, G6P, StaSy, starch. |
Considering the paths, it is conspicuous that a large number of paths go through ADP and ATP in the same order. The ADP, which is formed by fructokinase by consuming fructose and ATP or by hexokinase by consuming glucose and ATP, is converted again into ATP by glycolysis, NDPkin_rev, and adenylate kinase. This ATP is used for starch synthesis. Another path goes from F6P through glycolysis to starch synthesis. But, these paths are not of interest for our question, because during the starch synthesis process G6P is converted into starch, whereas ATP gives the necessary energy. Thus, we extend our constraints by excluding ATP "e{ SuSy } & no_ppi & lmax{ 11 } & e{ ATP }" or "e{ SuSy, ATP } & no_ppi & lmax{ 11 }". ADP must not be excluded explicitly, because ADP and ATP occur together on each path. Now, the path search yields the following two paths with a sensible biological interpretation:
| 1: | eSuc, SucTrans, Suc, Inv, Frc, FK, F6P, PGI_rev,G6P, StaSy, starch, |
| 2: | eSuc, SucTrans, Suc, Inv, Glc, HK, G6P, StaSy, starch. |
These two paths are the known paths through invertase.
Sucrose-(re)-synthesis
Here, all paths to re-synthesise sucrose should be found. One search starts from UDP-glucose and the other from glucose-6-phosphate. The search UDPglc->Suc "e{ ADP, ATP } & e{ UDP, UTP }" yields the three known correct paths:
| 1: | UGPase, PGM_rev, PGI, SPS, SPP, |
| 2: | SPS, SPP, |
| 3: | SuSy_rev. |
The search G6P->Suc "e{ ADP, ATP } & e{ UDP, UTP }" yields the following paths:
| 1: | PGM, UGPase_rev, SPS, SPP, |
| 2: | PGM, UGPase_rev, SuSy_rev, |
| 3: | PGI, SPS, SPP, |
We have introduced a new tool to search for paths with constraints in biochemical networks. We have explained the algorithm consisting of an exhaustive path search and the application of additional constraints defined by a special language. This language is described in the paper using various examples. We provided a case study for the sucrose-to-starch breakdown in potato tubers to show the correctness and usefulness of the tool. Biological aspects related to special pathways of our case study are also discussed. The presented tool has been designed for application in path searches in biochemical systems to find alternative paths in the system. We have implemented a Web interface, which includes a Graphical User Interface and supports the definition of constraints. So far, there exists an interface to Petri nets, which can be extended easily to other formats. In order to enhance the performance we will implement an improved algorithm, which considers constraints during the path search.
We would like to thank the anonymous referee for helpful advices.This work was partly supported by the Federal Ministry of Education and Research of Germany (BMBF), BCB project 0312705D.