Modeling the Architecture of Regulatory Networks

Anatolij Potapov1 and Edgar Wingender2




German Research Centre for Biotechnology,
Mascheroder Weg 1,
D-38124 Braunschweig, Germany.
1E-mail: apo@gbf.de
2E-mail: ewi@gbf.de







ABSTRACT

We focus on modeling the architecture of regulatory networks as the necessary prerequisite for a subsequent quantitative analysis of processes within such the architecture. To model the architecture of regulatory networks and their modules, a new approach has been suggested. The necessary formalism has been developed which treats processes of different complexity as a multiple conditional event and is based on a variant of Boolean logic. The approach enables a qualitative estimation of the role of individual elements in complex pathways and their networks. This might be particularly useful for modeling effects of pathologically relevant mutations in distinct components of transcription, signal transduction and metabolic systems. The approach proposed might be useful for protein target finding.



INTRODUCTION

Understanding the integral activity of large regulatory networks is a great challenge of modern molecular biology that switches its attention from separate individual molecules to large assemblies of interacting molecules. Complex systems are known to exhibit behaviour that is often not predictable from the properties of the component parts: the whole is more than the sum of its parts. There is a great need for comprehensive tools for the in silico analysis of regulatory networks as a whole entity. There is need for methods that can handle this data in a global fashion and enable to analyze them [1, 2, 3]. Here, we focus on modeling the architecture of regulatory networks that represents causal connections between network elements and is a logical skeleton of regulatory systems.



MODELING APPROACH

Regulatory networks might be represented (visualized) in a graphical form (a sketch with names of participating objects linked by arrows). Graphical representation, e.g. visualization, is usually considered to be the prerequisite for regulatory network analysis. This is rather illustrative and good enough for relatively simple systems. But for complex networks, having numerous elements and complicated links between them, this approach might be much less profitable. The profit of using visualization appears to be obviously limited: the more complex a system is, the less readable and useful is the corresponding sketch. Moreover, such visualization is oriented only on a human and not on a computer. This data form is not suitable for computational treatment: to be used for analytical purposes, it needs to be reformulated.

To provide an useful framework for integrating data and gaining insights into the properties of complex biological networks, other types of modeling abstractions, e.g. mathematical abstractions, might be used. The problem of modeling the architecture of regulatory networks can be solved on basis of a relatively simple (string) or more refined (matrix) description of the regulatory networks. Here, we suggest a modeling approach based on the first variant (string) of description. It provides an appropriate platform for representing the regulatory pathways and networks. By using a variant of the Boolean formalism, this enables an estimation of the contribution of each network component to the integrity of the network and the qualitative influence of each component on the regulatory output.

Regulatory networks are specific combinations of numerous components and reactions between them. Although they might be very different, some common rules of their organization can be found. To clarify the hierarchy of network organization, some preliminary formalization should be done.

We introduce a term "step" which means a combination of several components with a reaction between them and use "step" as an elementary functional unit of any network. Then, we define paths, pathways and networks of pathways and represent them as different types of combinations of steps. A path is a linear combination of several subsequent steps, e.g. steps sequence. It takes into account the direction of steps and avoids using one step more than one time. Each path begins with one step (entry) and ends with another one step (exit). Therefore, the input-output relationship is of 1:1 type. A pathway is a combination of several paths that start at the same entry and may form a multiple branched structure. A pathway has one entry and might have several exits. Accordingly, the input-output relationship is of 1:n type. Then, we introduce a network of pathways which is defined as a collection of several pathways. We say one network of pathways has several entries and several exits. The term "network of pathways" is rather specific and differs from the much more general term "network". A network of pathways operates with m:n type of the input-output relationship.

There are two main types of step junction - without branching, e.g. linear junction, and with branching, e.g. diverged junction or converged junction. We describe them by using logic operators AND and OR, correspondingly. The principal difference between operators AND and OR is the following. Steps linked with the operator AND form a linear path within which all steps are mutually dependent - the path as a combination exists only if all steps are available. The operator OR links two or more different linear paths (or fragments of linear paths) and these paths are mutually independent.

We treat steps, paths, pathways and networks of pathways as multiple conditional events. For this reason we introduce parameters "expression attribute of a component" () and "expression attribute of a reaction" (ß) which value could be either one, if the corresponding component or reaction is available, or null otherwise. To evaluate the integrity of paths and pathways, we introduce a status () of steps, paths, pathways and represent it as a function of a given set of expression attributes (see below).

To combine a set of steps into a path, only operator AND is needed. A path consisting of n subsequent steps can be represented as a string

   

This is a semi-algebraic representation of the architecture of a path where AND relates to the operation of multiplication: a sequence of steps is semi-multiplied by the subsequent step.

To formalize a pathway, both operators AND and OR are necessary. Operator AND is used for representing linear fragments of the pathway, while OR is used for representing the branching points of the pathway. The algebraic function of OR relates to the operation of addition. Actually, a pathway is a sum of several paths. Following the above mentioned logic, the architecture of a pathway might be represented in a semi-algebraic way such as

   

We say the status of a path, , which is actually the integrity of the path, is one if the status of each its step, , is one :

   

   

When the status of at least one step is set to null, the status of the whole path immediately drops down to null as well. All steps contribute equally to the status (integrity) of a path. Finally, the status of a path is a function of a set of expression attributes of all components and reactions involved in steps that constitute the path

   

   

By using this formalism, the topology of regulatory pathways and their networks could be expressed in an algebraic form suitable for storage and computer-assisted analysis. This does not need any simplifying assumptions and speculations.

Finally, the approach suggested might be useful for representation of the infrastructure of networks. Moreover, it enables an estimation of the architectural role of individual elements in complex pathways and their networks. This could be done by setting the expression attributes of a component () or reaction (ß) to null, recalculating the status () of modules that involve the corresponding component or reaction, and estimating the changes in the integrity of a network.

This approach might be useful for formal representation and analysis of the architecture of different modules of intracellular and intercellular networks. We focus particularly on three regulatory networks: signal transduction network, transcription network, and metabolic network, which communicate to each other. Each network is large, complex and has its own peculiarities. Nevertheless, despite of many differences, they are formally rather similar while speaking about their architecture. Some examples of such representation are presented.

To demonstrate how the approach suggested can be used, we apply this formalism to a network of two NF-kB (nuclear factor B) core pathways that start at TNF (tumor necrosis factor) and IL-1 (interleukin-1) (Figure 1).


Figure 1: Simplified representation of the NF-B network.

The NF-B family of transcription factors plays a critical role in the immune, inflamatory and apoptosis responses. In most cell types, NF-B is sequestered in the cytoplasm, bound to one of a number of inhibitory proteins such as IB, IBß, p105 or p100. Activation of the NF-B signaling cascade results in phosphorylation and degradation of IB, allowing nuclear translocation of the NF-B complexes. The structure of NF-B network is taken from TRANSPATH DB which is an information system on gene-regulatory pathways, and a module of the TRANSFAC database system [4, 5]. By using the suggested approach, we represent this network as an ordered set of the accession numbers of molecules (MO…) and reactions (XN…) that are derived from TRANSPATH DB [4, 5]. This enables to represent the architecture of NF-kB network in a compact algebraic form suitable for storage and analysis Figure 2.

The form helps to observe the hierarchy of the NF-B network structure - how different modules and sub-modules of the network are organized and linked to each other. By setting the expression attributes of different components and/or reaction to null, we can simulate the absence of the activity of any element, which could be due to pathologically relevant mutations or a pharmacological intervention, and then calculate the structure of the network part which would be still active. The results of such in silico experiments help to find elements and/or groups of elements which play a critical role in sustaining the integrity of the network. Such elements might be potential targets for pharmacological agents.


Figure 2: Formal representation of the NF-B network architecture. The molecule names are transformed into TRANSPATH accession numbers:
TNF - MO00024; TNFR1 - MO00206; TRADD - MO00207; RIP - MO00208; TRAF2 - MO00209; NIK - MO00203; IKK - MO00210; IKKß - MO00211; IL-1 - MO16589; IL-1R1 - MO16595; IL1RAcP - MO16594; MyD88 - MO16573; IRAK - MO00213; TRAF6 - MO00212; ECSIT - MO00139; MEKK1 - MO00047; TAK1 - MO16574; IKK - MO16599; I-B - MO00215; NF-B:I-kB - MO00254; Ub - MO00216; 26S proteasome - MO00218; NF-B - MO00058.




SUMMARY

To model the architecture of regulatory networks and their modules, a new approach has been suggested. The necessary formalism has been developed which treats processes of different complexity as a multiple conditional event and is based on a variant of Boolean logic. By using this formalism, regulatory pathways and their networks could be expressed in an algebraic form suitable for storage and computer-assisted analysis. The approach enables a qualitative estimation of the role of individual elements in complex pathways and their networks. This might be particularly useful for modeling effects of pathologically relevant mutations in distinct components of transcription, signal transduction and metabolic systems. The approach proposed might be useful for protein target finding.


ACKNOWLEDGEMENTS

This work has been supported by grants of the German Ministry of Education, Science, Research and Technology (BMBF; 01 KW 9629/7 and 01 KW 9906/1).


REFERENCES

  1. Kauffman, S. A. (1993). The Origins of Orders, Self-Organization and Selection in Evolution (Oxford University Press, N.Y.).
  2. Heinrich, R. and Schuster, S. (1996). The regulation of cellular systems (Chapman & Hall, N.Y.).
  3. D'Haeseleer, P., Liang, S. and Somogyi, R. (2000). Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16, 707-726.
  4. Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhäuser, R., Prüß, M., Schacherer, F., Thiele, S. and Urbach, S. (2001). The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29, 281-283.
  5. Schacherer, F., Choi, C., Götze, U., Krull, M., Pistor, S. and Wingender, E. (2001). The TRANSPATH signal transduction database: a knowledge base on signal transduction Networks. Bioinformatics, in press.