| In Silico Biology 4, 0024 (2004); ©2004, Bioinformation Systems e.V. |
Bioinformatics Institute
Singapore, 119613
Email:
cheemeng@bii.a-star.edu.sg,
ssomani@bii.a-star.edu.sg,
pk@bii.a-star.edu.sg
*Corresponding author
Edited by H. Michael; received November 05, 2003; revised and accepted March 26, 2004; published April 16, 2004
Mathematical modeling is a powerful approach for understanding the complexity of biological systems. Recently, several successful attempts have been made for simulating complex biological processes like metabolic pathways, gene regulatory networks and cell signaling pathways. The pathway models have not only generated experimentally verifiable hypothesis but have also provided valuable insights into the behavior of complex biological systems. Many recent studies have confirmed the phenotypic variability of organisms to an inherent stochasticity that operates at a basal level of gene expression. Due to this reason, development of novel mathematical representations and simulations algorithms are critical for successful modeling efforts in biological systems. The key is to find a biologically relevant representation for each representation. Although mathematically rigorous and physically consistent, stochastic algorithms are computationally expensive, they have been successfully used to model probabilistic events in the cell. This paper offers an overview of various mathematical and computational approaches for modeling stochastic phenomena in cellular systems.
Key words: in silico biology, noise, gene regulation, signal transduction, Gillespie algorithms, stochastic resonance, stochastic focusing, pathway modeling
Over the last couple of decades remarkable advancements in the field of molecular biology have led to a better understanding of biological systems at the cellular level. High throughput technologies have enabled sequencing of the whole genomes of organisms, gathering data on temporal gene expression patterns and uncovering new relationships in metabolic and signal transduction pathways. Traditionally biology has been dominated by a reductionist approach that focuses on specific cellular components like a single gene or single protein. Such low-throughput studies have generated a wealth of information but in a confined cellular space. Due to emergence of powerful sequencing technologies, we now have an extensive parts-list of the cell and a general idea of the interactions among genes, proteins, RNA and small molecules. Metabolic and gene regulatory pathways playing crucial roles in processes like cell division, response to external stimuli and development have been identified. However, the high degree of cross-talk between these pathways has resulted in an increase of informational complexity of the system. The grand challenge is to assemble all the 'pieces' together to create an integrated view of whole cell transactions.
Due to the complexity of the pathway interactions and large number of components involved, it is almost impossible to intuitively understand the behavior of cellular networks. Mathematical modeling and computer simulation techniques have proved useful for understanding the topology and dynamics of such networks. Application of these techniques on cellular systems have given rise to a new branch of in silico biology called systems biology. In silico biology has an edge over conventional experimental biology in terms of cost, ease and speed. Also experiments that are infeasible in vivo can be conducted in silico, e. g. it is possible to knock out many vital genes from the cells and monitor their individual and collective impact on cellular metabolism. Evidently such experiments cannot be done in vivo because the cell may not survive. Development of predictive in silico models offer opportunities for unprecedented control over the system. In contrast to physicists, biologists still do not understand the fundamental laws of biology. Modeling can provide valuable insights into the working and general principles of organization of biological systems. Also it can suggest novel experiments for testing hypotheses, based on the modeling experiences.
An important aspect of modeling cellular networks is the occurrence of stochastic or random events. The focus of this paper is on the various aspects of stochasticity in biological systems and is broadly divided into two sections. The first section discusses the evidence of stochasticity in biological systems, the physical basis of randomness and the relevance and impact of the inherent randomness. The second section discusses the modeling approaches, mathematical formalisms and simulation algorithms.
The occurrence of stochastic phenomena in a variety of physical systems like turbulent fluid flow, is well established. In the recent past attention has shifted to stochasticity, noise and its impact on biological systems.
Evidence of stochasticity in biological systems
Many studies have reported occurrence of stochastic fluctuations and noise in living systems. Observation of gene expression in individual cells has clearly established the stochastic nature of transcription and translation [Abkowitz et al., 1996; Ozbudak et al., 2002]. Elowitz and co-workers [Elowitz et al., 2002] constructed strains of Escherichia coli for detecting noise and discriminating between the intrinsic and extrinsic contributors of noise. Other studies showed that the messenger RNA production is quantal [Hume, 2000] and is produced in random pulses [Ross et al., 1994); Walters et al., 1995]. The current view is that protein production occurs in short "bursts" at random time intervals rather than in a continuous manner [Yarchuk et al., 1992; Chapon, 1982]. Identical initial conditions, such as concentrations of chemical species, temperature, pressure etc., have been shown to produce qualitatively different outcomes in the temporal evolution of a regulatory network. A classic example is the lysis/lysogenic switch of bacteriophage λ infecting E. coli. Due to noise the network may randomly evolve into one of the two bistable states [Hasty et al., 2000; Hasty and Issacs, 2001]. The role of the noise has also been seen in bacterial chemotaxis [Levin, 2003; Morton-Firth and Bray, 1998] and cellular selection [Till et al., 1964].
Origin of stochasticity
At the microscopic level of functionality, the interactions between molecules, e. g. DNA, mRNA, protein, small molecules, follow the laws of physics. A fundamental derivation of theoretical statistical physics is the famous
law [Schrödinger, 1992], which says that randomness or fluctuations in a system are inversely proportional to the square root of the number of particles, an indicator of the system size. Thus, a lower number of particles (low concentration) results in high fluctuation, the origin of which is largely due to thermal oscillations. Biochemical species participating in processes such as gene transcription>, gene regulation and signal transduction often occur in low copy numbers, e. g. as a single DNA template with small number of promoter sites, few tens of mRNA molecules and transcription factors numbering around few hundreds. As a result elementary reactions, such as polymerase binding or complex formation, take place with widely distributed reaction times. Such stochastic effects arising due to the inherent nature of biochemical interactions are often termed as intrinsic noise.
In the context of gene expression there exists an extrinsic component of noise too arising from random fluctuations in other factors, e. g. the number of ribosomes, the stage of the cell cycle, mRNA degradation, and the cellular environment. These are due to the external environmental conditions. For example a transcription factor for a particular gene is mostly the protein product of another gene and thus its production is also probabilistic. In these situations, a protein product arising out of a stochastic activation of a gene, leads to a cascade of downstream stochastic events. The timings of such triggers can result in vastly different outcomes [McAdams and Arkin, 1997; Arkin et al., 1998; Blake et al., 2003]. It is still unclear if the gene expression mechanism has an exact triggering threshold for every gene or whether there is a global threshold with distinct local sub-thresholds.
As explained above stochasticity becomes more pronounced with the decrease in the number of molecules. As the concentrations of the reacting species increase the fluctuations become less prominent and tend towards the deterministic solution. This fact is important from the computational complexity point of view as stochastic simulations are more expensive to run compared to deterministic Ordinary Differential Equations (ODE)-based simulations. We illustrate this fact by using an example from Michaelis-Menten enzyme catalysis. The catalysis mechanism is given by:
| E + S | (1) |
| ES | (2) |
| ES | (3) |
Fig. 1 shows the results of stochastic simulation for 10 seconds with initial enzyme (E) and substrate (S) concentration being 100 and 1000 particles, respectively. The figure shows the results of twenty independent but graphically superimposed runs. The increase in randomness or fluctuation with reduction in number of molecules are clearly evident.
|
Figure 1: Stochastic fluctuation in Michaelis-Menten enzyme catalysis. Enzyme and substrate initial concentration: (a) E0 = S0 = 100 particles; (b) E0 = S0 1000 particles. |
Manifestation of noise and its importance
The most important implication of noise in critical cellular processes is that in spite of identical initial conditions, with time, different cells may evolve along distinct pathways. Population measurements typically show that the expression level of the same gene varies significantly across the cells sharing the same genetic material. The origin of such variability among isogenic population is largely attributed to stochastic phenomena [Hasty and Collins, 2002].
Cellular pathways generally exhibit non-linear behavior because of the complex underlying mechanisms of interactions such as enzymatic actions, gene expression and various feedback loops. Due to the high non-linearity the networks often exhibit multiple stable states and bifurcations. As a result stochastic effects may drive the system randomly into distinct pathways. Experimental and computational studies on phage λ-infected E. coli clearly demonstrate this phenomenon [Arkin et al., 1998]. In this context, the presence of noise raises a different set of questions concerning robustness, reliability and sensitivity of cells. For example, how do cells accurately respond to environmental changes even though signal molecules behave in a probabilistic manner? How do we explain the large scale determinism and control needed for survival and growth of an organism when at the microscopic level the processes are inherently random?
These are complex questions, answers to which are being sought through non-linear dynamics. Noise can be constructive in helping a cell to respond in totally different ways depending upon external signals, i. e. random fluctuations at the molecular level can reproduce phenotypic diversity. Pathogenic organisms utilize random fluctuations on the surface to evade host responses [van de Putte and Goosen, 1992]. In case of bi-stability of the phage λ pathway, only a stochastic system can explore either of the two stable states in a dynamic environment. In a deterministic setting an initial condition would always guide a cell through a particular trajectory with no scope for flexibility of response. To avoid this, the gene expression variability and the accompanying noise make the cell flexible and help it to adapt to varying environmental conditions. It has been postulated that through the mechanism of regulatory feedback loops, networks can impose certain mean behavior even in the presence of randomness [Barkai and Leibler, 2000; Thattai and van Oudenaarden, 2001]. The simulation of a molecular model of the circadian rhythms of the PER protein in Drosophila and of the FRQ protein in Neurospora illustrates that robust circadian oscillations can occur even with a limited number of mRNA and protein molecules, i. e. in the range of tens and hundreds, respectively [Gonze et al., 2002].
For studying the cell sensitivity to noise, it is important to understand Stochastic Resonance (SR) and Stochastic Focusing (SF). Both arise due to the interaction of noise with a non-linear cell system. Stochastic resonance has been an area of intense research recently, particularly in the field of physics. SR is a "cooperative effect in which a small periodic influence entrains extrinsic random noise" [Wiesenfeld and Jaramillo, 1998]. Using SR, a low amplitude periodic signal (difficult to measure) can be detected by utilizing the relatively high amplitude noise. There has been conclusive evidence of SR in biological systems, particularly in sensory systems at tissue and subcellular levels [Douglass et al., 1993; Bezrukov et al., 1994; Levin and Miller, 1996]. The mechanoreceptor hair cells of the crayfish Procambarus clarkii detect coherent water motions using SR [Douglass et al., 1993]. At the subcellular level SR has been observed in voltage dependent ion channels formed by the peptide alamethicin [Bezrukov et al., 1994].
Stochastic focusing refers to the phenomenon where by cells utilize the noise to tune a gradual response, resembling somewhat to a threshold driven mechanism [Paulsson et al., 2000]. In addition to the regulatory control through feedback loops, stochastic focusing plays an important role in effecting precision control and imposing checkpointing in critical cellular processes. Evidently, stochastic resonance and focusing can together produce determinism through higher regulatory control, though experimental evidence to validate this concept is yet unavailable.
Having highlighted the importance and role of stochastic phenomena in the previous sections we now summarize some of the modeling approaches.
Early attempts in model building of cellular systems focused primarily on metabolic pathways as they were well characterized both qualitatively and quantitatively. Emphasis was put mainly on simulating in vitro environments where due to the high concentrations of substances a stochastic approach was not needed. As such the modeling efforts took a macroscopic view of the biological networks. A macroscopic view deals with the aggregated quantity of substances and the rate of change of concentrations. It formulates the system based on chemical rate laws governing the species concentration. The substance-reaction relationship is expressed as a set of non-linear differential equations [Segel, 1976] or as S-system [Voit, 2000]. Over many years, powerful algorithms with sound mathematical basis have been developed for solving these equations. The tools based on these algorithms have widespread applications especially in the field of metabolic engineering [Cornish-Bowden and Hofmeyr, 1991; Schuster and Heinrich, 1995; Stephanopoulos et al., 1998; Voit, 2000]. Various mathematical tools have been added for sensitivity analysis, equilibrium analysis and bifurcation analysis to gather more information on the systems. In addition, useful tools like parameter estimation have also been developed [Voit, 2000]. At present a variety of software packages specifically developed for deterministic simulation and analysis of biochemical reaction systems are available, such as DBsolve [Goryanin et al., 1999], GEPASI [Mendes, 1997], KINSIM [Dang and Frieden, 1997], MIST [Ehlde and Zacchi, 1995], KINSOLVER [Aleman-Meza et al., 2002], PLAS [Voit, 2000], ECELL [Tomita et al., 1999] and Cellware [Dhar, 2003].
The fundamental feature of all the simulations based on the macroscopic view is that they are deterministic in nature. As a result, the system evolves along a fixed path from its initial state. As explained above such an approach cannot be taken for modeling stochastic processes such as gene regulation. Also in a deterministic simulation space, it is not possible to capture emergent phenomena that arise from inherent randomness.
Another limitation of the deterministic approach is the assumption that concentration varies continuously and continually over time. In other words the concentrations are real numbers in contrast to the discrete numbers in the stochastic simulations, and also the variation of concentrations shows no discontinuities or "jumps". This assumption is valid as long as the concentrations are high. However, in cells the concentrations could be very low (e. g. single DNA molecule, RNA and protein molecules ranging from tens to a few hundred). As a result, at low concentration the discrete nature of change needs to be captured.
For stochastic modeling, it is feasible to take a mesoscopic view of the system which can keep track of every molecule in the system. Concentrations changes of discrete number of molecules corresponding to certain reaction events. Also, in contrast to the deterministic system, the change is random or stochastic in nature. Assuming a well mixed compartment at thermodynamic equilibrium, many algorithms have been developed for simulating systems at the mesoscopic level. The following sections of this paper outline some of the representative algorithms along with a comparative study of strengths and limitations of each algorithm. Other candidate modeling approaches for gene regulatory and biochemical networks, which are complementary or over-lapping with the above approaches are also being investigated. Various kinds of network models like Bayesian network, Boolean networks and generalized logical networks try to decipher the static and dynamic structure of regulatory networks based on experimental data. Other models try to capture more complex phenomena like spatio-temporal variations, diffusions, etc. All these methods can be categorized as static or dynamic, discrete or continuous, qualitative or quantitative. Excellent reviews on these issues have appeared recently [de Jong, 2002; Smolen et al., 2000].
In the previous sections, we presented reasons in favour of biological simulation using stochastic algorithms. This section will further describe the origin and numerical implementations of stochastic algorithms. In addition, the advantages and disadvantages of each algorithm will be discussed, together with its application in real biological systems.
Biological systems represent dynamic environments exhibiting changes from one state to another. The exact nature of the dynamics is dictated by the form of the perturbation in the network. Stochasticity in the dynamics arises in one of the two ways. As discussed in previous section, intrinsic stochasticity is inherent to the system, arising due to the relatively small number of reactant molecules, whereas extrinsic stochasticity originates due to random variation of one or more environmental factors, e. g. temperature and concentrations of the reactant species.
Intrinsic stochasticity
The temporal behavior of a spatially homogeneous mixture of molecular species can be described by a chemical Master Equation [Gillespie, 1977]. Chemical Master Equation is a form of mathematical formalism that describes the transition of the system from one state to another state using probabilistic methods. Before introducing the Master Equation, we first define the following notations.
Given the notations, we can describe the evolution of p(X, t) in terms of the rates α and β as follows.
![]() | (4) |
The first term on the right hand side of Eqn. 4 represents the probability at which X remains its state, whereas the second term is the probability at which X undergoes one reaction in time (t, t+ Δ t). Reorganizing Eqn. 4, and taking the limit as Δ
, gives the final form of Master Equation Eqn. 5. Notice that the transition of the state of the system is described through changes of the probability of the system being in a certain state, p(X, t). Hence, the inherent stochasticity of the system is mathematically formalized in this context.
![]() | (5) |
The Master Equation approach tries to write a system of equations for every possible transition state and solve them simultaneously. Generating a single trajectory is significantly easier. However, when the dimensionality of problem increases, the possible trajectories explode combinatorially and the problem becomes intractable. In view of this limitation, Gillespie devised a better way of generating such trajectories [Gillespie, 1977]. Instead of writing the whole Master Equation explicitly, the Gillespie Algorithm generates trajectories by picking reactions and times according to the correct probability distributions so that the probability of generating a given trajectory with the simulation algorithm is exactly the same as the solution of the Master Equation [Gibson, 2000].
Two different classes of stochastic simulation algorithms exist for chemically reacting system, namely the Gillespie Algorithm and StochSim Algorithm. The following sections will describe these algorithms in detail, stating the strengths and limitations of each. Various improvement and enhancement schemes proposed by Gibson and Bruck, 2000, Gibson, 2000, Gillespie, 2001 and Haseltine and Rawlings, 2002, to the algorithms will also be described.
Extrinsic stochasticity
In the case of extrinsic stochasticity, the stochasticity is introduced by incorporating multiplicative or additive stochastic terms into the governing reaction equations. These terms, normally viewed as random perturbations to the deterministic system, are also known as stochastic differential equations. The general equation is:
![]() | (6) |
The definition of the additional term ξx differs according to the formalism adopted. In Langevin Equations [Gillespie, 2001], ξx is represented by Eqn. 7. Other studies [Hasty and Issacs, 2001] adopt a different definition where ξi(t) is a rapidly fluctuating term with zero mean ({ξi(t)} = 0). The statistics of ξi(t) are such that ({ξi(t) ξi(t')} = 0) = Dδij (t - t') to maintain independence of random fluctuations between different species (D is proportional to the strength of the fluctuation).
![]() | (7) |
where
Gillespie, 1977, proposed two exact stochastic simulation algorithms to solve the Chemical Master Equation based on the assumptions that the system is homogeneous and well mixed.
At each time step, the chemical system is exactly in one state. The idea is to directly simulate the time evolution of the system. Basically, the algorithm determines the nature and occurrence of the next reaction, given that the system is in state α at time t. Given a system with total number of reaction channels N and total number of species M, we then define the following notations.
Through Gillespie algorithm [Gillespie, 1977], it has been proved that:
![]() | (8) |
Eqn. 8 is referred to as the Reaction Probability Density Function. It can be further decoupled into two probability distributions so that the probability distribution for firing time of reaction, P(τ) and the probability distribution for the reaction to fire, P(μ) are independent. These probability distributions are given by Eqn. 9 and Eqn. 10.
![]() | (9) |
![]() | (10) |
Eqn. 9 gives the probability of a reaction channel μ occurring next while Eqn. 10 takes into account the duration τ at which the reactions occur. Eqn. 9 is a discrete probability distribution and Eqn. 10 an exponential probability distribution. The algorithms generate τ and μ from the probability distribution using the probability distribution inverse transform method. The derivation of the two probability distributions lead to Gillespie's Direct Method and Gillespie's First Reaction Method.
Although the Gillespie algorithm solves the Master Equation exactly, it requires substantial amount of computational effort to simulate a complex system. Three situations cause an increase of computational effort in the Gillespie algorithm. These conditions decrease the time step of each iteration thus forcing the algorithm to run for larger number of iterations to simulate a given experiment. The conditions are as follows:
These factors cause great disparities in the timescale of reaction channels in the system. This phenomenon is similar to the stiffness problems in ODE methods. In ODE methods, whenever the reaction rate between the various reaction channels displays large differences in magnitude, the computation slows down significantly and the phenomenon is referred to as the "stiffness" problem. In the stochastic algorithms, whenever the complexity of the system increases, either through the increase of reaction channel itself or through a numerical increase in the number of molecules in the system or an increase in the reaction rates, the algorithm adopts small τ in order to maintain the exactness of the simulation. In other words, the algorithm requires shorter timestep to capture the fast dynamics of the system. Thus, the difference in the time-scales between different reaction channels may cause substantial computational difficulties.
The Gillespie algorithm has been applied to many in silico biological simulations recently. Kastner et al., 2002, applied the algorithm for simulation of Hox cis-regulatory mechanisms. The simulation was successful in reproducing key features of the wild-type pattern of gene expression and in silico experiments yielded results similar to that of in vivo experiments. Besides that, Kierzek et al., 2001, applied the algorithm to model lacZ gene expression and discovered the influences of the frequencies of transcription and translation initiation on random fluctuations in gene expression. McAdams and Arkin, 1997, also studied the transcription initiation and translation mechanisms in the cellular regulatory network using Gillespie's algorithms and found several stochastic phenomena like the fluctuation in protein production and switching delay for genetically coupled links.
Improvements
Gibson and Bruck, 2000, proposed the Next Reaction Method as an enhancement of Gillespie First Reaction Method. The Next Reaction method manages to improve time performance of Gillespie algorithms substantially while maintaining exactness of the algorithm. As mentioned above, Gillespie algorithms require enormous computational time for a system with a large number of reactions. Gibson highlighted the following calculation that is repeated in every iteration of the computation causing additional computational overhead in Gillespie algorithms.
In view of the inherent weaknesses in the First Reaction Method, Gibson, 2000, introduced the Next Reaction Method which utilizes the following steps:
Two specific data structures are used to store α's and τ's. In this context, a Dependency Graph is used to store the relationship between changes in αi to the firing of a given reaction channel. An Indexed Priority Queues is used to store τi to facilitate updating of τi's and retrieving of the smallest τi.
Examining the algorithm closely, one can easily see that the Next Reaction Method uses only one random number per iteration as compared to the M random numbers in the First Reaction Method. The reduction in generation of random number and better book-keeping of the data has improved the performance of Gillespie Algorithms significantly. However, the maintenance of the dependency graph and indexed priority queues becomes the performance trade-off in this algorithm.
In 2001, Gillespie presented the Tau-Leap Method to produce significant gains in the computational speed with an acceptable loss in accuracy. Gillespie algorithms solve the Master Equation exactly and obtain the exact temporal behavior of the system by generating exact timing of the firing of each reaction channel. However, it is sometimes unnecessary to obtain so much of detail from the simulation. Instead of finding out which reaction happens at which time step, one may like to know, how many of each reaction channels are fired in a certain time interval. If the time interval is large enough for many reactions to happen, one can expect substantial gain in the computational speed. However, in order to maintain accuracy of the method, one has to select an appropriate time interval, τ to be small enough so that the change in propensity function, αi is acceptable.
A thorough analysis of the programming complexity and run time performance of the algorithms may be required. This is due to the reason that the algorithms require a generation of inverse Poisson distribution which becomes too costly when the propensity function αi increases significantly. Although one can approximate the poisson distribution with a normal distribution (as described in Gillespie, 2001) which is comparatively quicker to generate than the inverse Poisson distribution, the accuracy of the solutions is reduced. In addition, one has to be cautious because this may falsify the basic assumptions of a discrete and stochastic model.
Gillespie, 2001, detailed several mechanisms for selecting τ, which include the explicit method, estimated-midpoint method and k - α method. Unfortunately, these explicit schemes fail to ensure stability of the algorithms [Gillespie, 2001]. Further improvements to the control mechanism are the implicit schemes [Gillespie and Petzold, 2004] which solve the stability problems encountered in stiff systems.
In 1998, Morton-Firth, 1998, developed the Stochsim algorithm. The algorithm treats the biological components, for example, enzymes and proteins, as individual objects interacting according to probability distribution derived from experimental data. In every iteration, a pair of molecules is tested for reaction. Due to the probabilistic treatment of the interactions between the molecules, Stochsim is capable of reproducing realistic stochastic phenomena in the biological system. Both the Gillespie algorithm and the Stochsim algorithm are based on identical assumptions [Morton-Firth, 1998]. "Pseudo-molecules" are used as numerical treatment for maintaining accuracy of the algorithm. In addition, the number of "pseudo-molecules" can be optimized to overcome the stiffness problem.
In contrast to the variable time step in the Gillespie algorithm, the Stochsim algorithm uses a fixed time step that can be optimized to the desired accuracy. However, the convenience of this feature comes with an additional burden of having non-productive time steps. Many time-steps are wasted with no changes in the states of the system. A second limitation of the Gillespie algorithm is that it results in computational in-feasibility when the species contain multi-state molecules. For example a protein which has ten binding sites will have a total of 210 states and it requires the same number of reaction channels to simulate this multi-state protein in the Gillespie algorithms. Since the Gillespie algorithm scales with the number of reaction channels, it is impossible to conduct such a simulation [Shimizu, 2002]. The Stochsim algorithm can be modified easily to overcome this problem by associating the states to the molecules without introducing much computational difficulties. Under special circumstances when the number of reactions is small and the number of molecules is large, the Gillespie algorithm is more efficient than the Stochsim algorithm.
Stochsim has been employed successfully in several biological systems, e. g. for examination of the fluctuations of molecules in a chemotactic signaling pathway of coliform bacteria [Morton-Firth and Bray, 1998] and for studying the effects of the flexible tethers at the receptor C-termini that interact with the adaptation enzymes CheR and CheB [Shimizu, 2002].
Parallel algorithms
We have seen that stochastic algorithms require huge computational power and therefore it takes a long time to simulate a realistic biological system. One way to overcome the problem is to parallelize the algorithms and utilize cluster/grid technology to exploit huge resources of computational power.
Schwehm, 1996, attempted to parallelize the algorithms by spatially decomposing the entire volume into smaller volume so that each volume has smaller number of molecules. Each sub-volume is then an instance of the Gillespie algorithm. The idea is indeed straightforward, as can be seen from Eqn. 9, when the number of molecules decrease, reaction probabilities decrease and τ increases. In other words, the time steps of the algorithms are longer and therefore, save the overall computational cost.
In order to maintain the consistency of spatial decomposition, the stochastic kinetic constants have to be adjusted accordingly [Schwehm, 1996]. Schwehm reported that Parallel Gillespie Algorithms run at least two times faster than Gillespie algorithms in a simple cascade system with 1000 cascades, 100000 molecules and 9 subvolumes. Unfortunately, the consistency and stability of the algorithms were not considered in his studies. We have tested the algorithm with a simple bimolecular reaction as shown in Eqn. 11 with kinetic constants, k = 1, X0 = 100, Y0 = 100. The entire volume is divided into 2 sub-volumes and there is no synchronization between the sub-volumes during the simulation. At the end of the simulation, the results from the 2 sub-volumes are superimposed to form a single time series solution.
Figure 2 plots the probability density function of species X at time 0.1s for 100 runs. One can immediately see the differences between the two probability distribution functions in which the mean and standard deviation of the function are significantly different. The skewing of the probability density function illustrates the reduction in accuracy of the results by using Parallel Gillespie Algorithms. Therefore, more studies on algorithmic stability and accuracy are necessary to enhance the Parallel Gillespie Algorithm.
| X + Y = Z | (11) |
|
Figure 2: Probability density function of species X in reaction 11 at time 0.1s. Numerical parameters: kinetic constant = 1, X0 = 100, Y0 =100, No of subvolumes = 2, No of runs = 100. |
Previous sections cover the stochastic algorithms for modeling biological pathways with no spatial information. However, the real biological world consists of components which interact in a three dimensional space. Within a cell compartment, the intracellular material is not distributed homogeneously in space and molecular localization plays an important role, e. g. diffusion of ions and molecules across membranes and propagation of an action potential along a nerve fiber's axon. Thus, basic assumption of spatial homogeneity and large concentration diffusion is no longer valid in realistic biological systems [Elf et al., 2003]. In this context, stochastic spatio-temporal simulation of biological system is required.
The enhancement on the performance of Gillespie Algorithms has made the spatio-temporal simulation tractable. Stundzia and Lumsden, 1996, and Elf et al., 2003, extended the Gillespie Algorithms to model intracellular diffusion. They formalized the reaction-diffusion master equation and the diffusion probability density functions. The entire volume of a model was divided into multiple subvolumes and by treating diffusion processes as chemical reactions, the Gillespie Algorithm was applied without much modification. Stundzia has showcased the application of the algorithm on calcium wave propagation within living cells and has observed regional fluctuations and spatial correlations in the small particles limit. However, this approach requires detailed knowledge about the diffusion processes to be available, in order to estimate the probability density function for diffusion. Furthermore, the algorithms have only been applied to small systems with finite number of molecular species but require large amount of computational power.
Shimizu, 2002, also extended the Stochsim algorithm to include spatial effects of the system. In his approach, spatial information was added to the attributes of each molecular species and a simple two dimensional lattice was formed to enable interaction between neighboring nodes. The algorithm was applied to study the action of a complex of signaling proteins associated with the chemotactic receptors of coliform bacteria. He showed that the interactions among receptors could contribute to high sensitivity and wide dynamic range in the bacterial chemotaxis pathway.
Another way of simulating stochastic diffusion is to directly approximate the Brownian movements of the individual molecules (MCell [Bartol and Stiles, 2002]). In this case, the motion and direction of the molecules are determined by using random numbers during the simulation. Similarly, collisions with potential binding sites and surfaces are detected and handled by using only random numbers with a computed binding probability. MCell is capable of treating stochastic and a 3-dimensional biological model that involves a discrete number of molecules. Though MCell [Bartol and Stiles, 2002] incorporates 3D spatial partitioning and parallel computing to increase algorithmic efficiency, the simulation is limited to the microphysiological processes such as synaptic transmission due to high computational requirement.
Apart from the enhancements on various algorithms, the simulation of a spatio-stochastic biological system is still infeasible. Regardless of the fact that the knowledge is incomplete, it is still unclear how to extract diffusion coefficients from experimental results and to track 3-dimensional shapes or structural changes in the cells.
As has been pointed out in the section on Gillespie algorithms, stochastic algorithms suffer from the same "stiffness" problems as that of deterministic algorithms. In order to capture the fast dynamics of the system, entire simulation is slowed down significantly. Hence, the basic idea of hybrid algorithms aims to exploit the advantages of other algorithms to offset the disadvantages of the stochastic algorithms.
Several attempts have been made to illustrate the relevance and feasibility of hybrid algorithms. Bundschuh et al., 2003, Haseltine and Rawlings, 2002, and Puchalka and Kierzek, 2004, have used a similar approach to integrate ODE/Langevin with Gillespie algorithms. In both cases, the modeler has to identify methods and criteria to partition the system into fast dynamics and slow dynamics sub-systems. The fast dynamics subsystem can be handled by either ODE or Langevin Equations while the slow dynamics subsystem can be handled by Gillespie algorithms. In addition, numerical treatment such as the "slow variables" in Bundschuh et al., 2003, and the "probability of no reaction" in Haseltine and Rawlings, 2002, is required to maintain accuracy of the solutions. The algorithms show promising results and the results are consistent with those from Gillespie algorithms. Haseltine and Rawlings, 2002, showcased the applicability of hybrid algorithms by simulating the effect of stochasticity to the bi-modality of an intracellular viral infection model using the algorithm. Kiehl et al., 2004, also tested the algorithms on the λ phage model.
The relevance of hybrid algorithms has been pointed out in several papers [Alur et al., 2001; Matsuno et al., 2000; Bockmayr and Courtois, 2002]. Bockmayr and Courtois, 2002, used hybrid constraint programming methods to model an alternative splicing regulation model. This implementation is very useful under circumstances where detailed knowledge about the model is unavailable. Meanwhile, Alur et al., 2001, used CHARON, a formal description language of hybrid system which combines ODE with "mode switching" mechanism to model the quorum sensing phenomenon in Vibrio fischeri, a marine bacterium that involves the Lux regulon. A Hybrid Petri Net [Matsuno et al., 2000] approach has been employed to model a hybrid system using ODEs and discrete events. This method has been used to model the growth pathway control of λ phage.
Hybrid algorithms aim to close the gap between macroscopic and mesoscopic scales of the system. In particular, the relevance of hybrid modeling has been proved necessary to capture the behavior of a real biological system. Moreover, hybrid algorithms have substantially cut down the computational cost of large scale modeling and simulation. One major drawback here is that by introducing additional numerical treatment to the algorithms, more parameters have to be defined and the accuracy of the solutions is dependent on the accuracy of parameters. Mostly, the simulations result in solutions of highly tuned parameters. Although these hybrid approaches show significant improvements in the computational cost, there are still lots of computational issues to be resolved before it can be applied to a realistic problem. Some of the issues are:
Previous sections highlighted various forms of stochastic algorithms and their application to simulation of biological systems. We have seen the transition of basic stochastic algorithms to Parallel Algorithms, Hybrid Algorithms and Spatio-Temporal Algorithms. Each algorithm imposes a certain constraint on the computational power, knowledge of the system and input of numerical parameters. In addition, the algorithms provide different abstractions of the system and produce solutions of varying accuracy. Table 1 gives an overview of the requirement imposed by the algorithms.
| Table 1: | Comparison of various stochastic algorithms |
| Algorithms | Computational cost | Modeling knowledge | Speed | Accuracy |
|---|---|---|---|---|
| Gillespie algorithms | Very high | Medium | Slow | Very high |
| Gibson algorithms | High | Medium | Fast | Very high |
| Stochsim | High | Medium | Fast | High |
| Tau-Leap methods | Low | High | Very fast | Medium |
| Spatial-temporal algorithms | Extremely high | Very high | Very slow | High |
| Parallel algorithms | Low | Medium | Very fast | Low |
| Hybrid algorithms | Low | High | Very fast | Medium |
| Note: The comparison is qualitative and based on relative scale. |
Table 1 shows a feature comparison of stochastic algorithms. One can immediately see that both spatio-temporal algorithms and Gillespie algorithms impose the highest computational requirement and therefore are equally slow in simulation. However, they produce more exact solutions among all the algorithms. On the other hand, Tau-Leap methods, hybrid algorithms and parallel algorithms are among the fastest simulation algorithms. However, due to various numerical treatments to the algorithms, both Tau-Leap methods and hybrid algorithms require substantial modeling knowledge to ensure the accuracy of the solutions. Besides that, Stochsim and Gibson algorithms are efficient algorithms which increase the speed of simulation without sacrificing the accuracy of solutions.
The advances of stochastic algorithms have opened tremendous opportunities to uncover the hidden mechanism in a cell. Many tools have been developed to simulate a stochastic model. Each tool provides a modeling environment and embedded simulation engine (Table 2).
| Table 2: | Comparison of modeling and simulation software |
| Tool | Algorithm | GUI | Operating system | License |
|---|---|---|---|---|
| Stochsim* | Stochsim algorithms | Yes (Windows only) | L, W, M | GPL |
| ECell* | Gibson algorithms, hybrid algorithms | Yes | L | GPL |
| Cellware* | Gillespie algorithms, Gibson algorithms, Tau-Leap algorithms |
Yes | L, W, M | Freeware |
| MCell | Spatial-temporal algorithms | Yes | L | Freeware |
| Mbius | Stochastic Petri net algorithms | Yes | W, U, L | Freeware |
| CellIllustrator | Hybrid Petri net algorithms | Yes | W, L | Commercial |
| VirtualCell* | Spatio-temporal algorithms | Web interface | W, L | Freeware |
| SBW* | Gillespie, Gibson and Tau-Leap algorithms | Yes | W | GPL |
| * | Tools using System Biology Markup Language (SBML). |
Table 2 is not an exhaustive list of all the available modeling and simulation tools. Some of the tools marked with "*" have entered a collaboration with the SBW-SBML project to standardize the cell modeling language using the System Biology Markup Language (SBML). As we can see, all the tools included in the list provide a Graphic User Interface (GUI) either in Linux, Windows or Unix environments. Some of these tools provide extensive coverage of algorithms e. g. Ecell and Cellware while others focus on specific algorithms e. g. MCell, Stochsim, Mobius and CellIllustrator.
Cellware [Dhar et al., 2004] is an on-going project in the Bioinformatics Institute of Singapore which aims to provide intuitive modeling environment while exploiting the grid technologies to increase the feasibility of analysis and simulation of biological systems. Grid Cellware version 2.0 [Dhar, 2003] has been released in March 2004 and includes various simulation algorithms and parameter estimation algorithms.
In this paper, we have listed various kinds of stochastic modeling and simulation techniques applied to many biological systems. The relevance of stochastic simulation in cellular systems has been validated by a number of experimental observations [Yarchuk et al., 1992; Ross et al., 1994; Walters et al., 1995; Hume, 2000; Chapon, 1982]. The paper also provides a detailed description of the stochastic algorithms, together with a literature review of its application to the real biological simulation. Although the pathway simulation has a considerable history and recently stochastic algorithms have been developed, integrating the simulation models into a realistic whole cell simulation is still an emerging field.
Stochastic simulation has proved its relevance to the realistic behavior of the biological system. And we have seen its power in uncovering many hidden biological phenomena. Though much improvement is needed before applying it to whole cell simulation, the future of stochastic simulation is bright and we envision the applicability of stochastic simulation to realistic biological systems in the near future.
The authors would like to acknowledge the financial support from A*Star, Singapore.