| In Silico Biology 6, 0017 (2006); ©2006, Bioinformation Systems e.V. |
1 Bioinformatics and Computational Biology; University of Idaho, Moscow, ID 83843
2 School of Molecular Biosciences; Washington State University, Pullman, Washington 99164
Email: bank2192@uidaho.edu, heckendo@uidaho.edu
* Corresponding author
Edited by H. Michael; received December 02, 2005; revised and accepted March 21, 2006; published April 22, 2006
A G2/M genetic network simulation is trained with tumor incidence data from knockout experiments. The genetic network is implemented using a neural network; knockout genotypes are simulated by removing nodes in the neural network. Two analyses are used to interpret the resulting network weights. We use a novel approach of fixing the network topology that allows knockout TSG (tumor suppressor gene) data from multiple studies to overlap and indirectly inform one another. The trained simulation is validated by reproducing qualitative mammary cancer susceptibilities of ATM, BRCA1, and p53 TSGs. The work described is valuable because it allows TSG mammary cancer susceptibility to be quantified using genetic network topology and in vivo knockout data.
Keywords: genetic network, neural network, mammary cancer, susceptibility, ATM, BRCA1, p53, mouse model, G2/M, cell cycle, knockout, simulation, tumor suppressor gene, TSG, cancer modeling, regulatory pathway, signal transduction
Tumor suppressor genes (TSGs) inhibit progress through the cell cycle [Fairbanks and Andersen, 1999]. TSGs are integral in preventing human breast cancer and are potential diagnostic and therapeutic drug targets [Osborne et al., 2004]. Animal models have shown that mutations in TSGs result in increased susceptibility to various cancers [Hakem and Mak, 2001]. It has also been shown that endogenous mutations, caused by cellular reproduction and cellular metabolites, result in accumulating somatic mutations of TSGs--leading to sporadic cancers [Demant, 2003].
Mus musculus is the model organism of choice for studying human breast cancer. The mouse genome is well studied and known to have well conserved genes and genetic network pathways. Transgenic mice can be created with mutated or knockout TSGs to provide a measurable increase in susceptibility to mammary cancer. Unfortunately, knockout mice have only been studied since the mid-1980's and a limited number of knockout genotypes are available [Hennighausen, 2000]. Some TSG knockout genotypes, such as BRCA1 -/-, result in other aggressive cancers (thymic lymphoma) before mammary cancer susceptibility can be measured [Cressman et al., 1999]. Other knockout genotypes often result in embryonic death, such as p53 -/- [Hakem and Mak, 2001].
To address the problems listed above, we use a genetic network simulation to combine the mammary cancer incidence data from several TSG knockouts. A neural network is used to implement the genetic network [Welch et al., 2003; Gronlund, 2004; Vohradsky, 2001; Tian and Burrage, 2003]. The resulting genetic network simulation is informed by a genetic network topology and in vivo knockout mouse experiments. We present two analyses to interpret the trained simulation: I/O Contribution and Robustness. I/O Contribution is a metric for gene relevance to mammary cancer susceptibility. Robustness measures the ability of the gene to function despite accumulating mutations. Note the term 'gene' is used throughout the text to describe a gene or the gene product.
Hakem and Mak [Hakem and Mak, 2001] describe the importance of several TSGs that inhibit the G2/M phase of the cell cycle. Knockout mice with a p53+/- genotype are described as being "predisposed" to tumorigenesis and likely to become cancerous within the first 9 months. Knockout mice with an ATM+/- genotype are healthy, but show increased sensitivity to ionizing radiation. Whereas, knockout mice with a BRCA1+/- genotype show no increase in tumor incidence. We hypothesize that the genetic network simulation will reproduce these susceptibility relationships: (1) p53 is more important than ATM and BRCA1, (2) ATM is more important than BRCA1 (i.e.: p53 > ATM > BRCA1).
We also hypothesize the trained genetic network simulation will provide a quantitative measure for TSG contribution to mammary cancer susceptibility. Although knockout genotypes have been used to quantify interactions by Winzeler et al. [Winzeler et al., 1999] and Kaufman et al. [Kaufman et al., 2004] , this simulation is novel because it uses a neural network with a fixed topology based on the current understanding of the G2/M genetic network. This fixed topology is another form of data to inform the simulation; the actual in vivo G2/M genetic network topology exhibits structural weaknesses and strengths. Also, by fixing the network topology, training allows different TSG knockout susceptibilities to overlap and inform one another. As a result, we show that information from a variety of knockout experiments can be combined through innovative training of a neural network to quantify the influence of specific genes on mammarian cancer susceptibility.
Decades of biochemical research have resulted in genetic network diagrams, such as Fig. 1, describing qualitative gene-to-gene, protein-to-gene, and protein-to-protein interactions. Genetic network diagrams show basic activating (arrows) and inhibitory (flat arrows) dependencies between genes [Kastan and Bartek, 2004; Ohi and Gould, 1999; Kanehisa and Goto, 2000].
|
Figure 1: Genetic Network Diagram of G2/M checkpoint for cell cycle control. Arrows represent activation. Flat arrows indicate inhibitory interactions. (Courtesy of Cell Signaling Technology.) |
We model the genetic network controlling G2/M transition of the cell cycle for two primary reasons. First, dysfunctional TSGs, such as p53, BRCA1, and ATM, are known to result in carcinogenesis by failing to inhibit G2 & M phases of the cell cycle [Hakem and Mak, 2001; Hennighausen, 2000]. Second, by definition, carcinogenic behavior requires a population of abnormally reproducing cells; the cell cycle is critical to this pathological state [Tannock and Hill, 1998; Kastan and Bartek, 2004].
Traditionally, genetic network models attempt to reproduce molecular regulatory behavior of interactions between DNA, RNA, proteins, and small molecules [Jong, 2002]. Complexity challenges emerge when attempting to accurately simulate a genomic network consisting of approximately 30,000 nodes (genes) [Boguski, 2002]. Entire molecular biology labs may spend decades focusing on interactions involving one of these nodes; it is not possible to accurately and precisely reproduce in vivo complexity without abstraction [Endy and Brent, 2001; Nagiel, 2002]. Simpler models are required to model genetic networks. The genetic network simulation presented in this paper results from a systems biology [Kitano, 2002] abstraction of the G2/M genetic network. Interactions between genes are limited to the knockout TSGs ATM, BRCA1, and p53.
Mice and humans have well-conserved genetic networks and mice are an established model organism for studying human mammary carcinogenesis [Hakem and Mak, 2001]. Knockout mice have one or both germ line copies of a particular allele deleted. Mice with knockout TSGs often exhibit increased susceptibility to carcinogenesis [Holland, 2004; Hennighausen, 2000]. Cumulative tumor incidence graphs describe onset of carcinogenesis in a population over time and are often used to show cancerous susceptibility of knockout mice.
Fig. 2 shows data from two publications using TSG knockout genotypes to study mammary carcinogenesis. Umesako et al. used mice with knockouts involving p53 and ATM [Umesako et al., 2004]. Xu et al. used mice with knockouts involving p53 and BRCA1 [Xu et al., 2001]. Although there are experimental differences between the two publications (mouse strain, facilities, etc.), tumor incidence is markedly different for the p53 & ATM knockouts versus the p53 & BRCA1 knockouts. For example, the p53+/-BRCA1-/- (Xu et al.) knockout genotype results in a much earlier onset of mammary cancer than the p53+/-ATM+/- genotype (Umesako et al.). This difference in tumor incidence is consistent with literature findings [Hakem and Mak, 2001]. Also, both sets of data result from mice strains using the same genetic network topology. Both papers presented data for the p53+/- genotype and differences were accounted for by combining tumor incidence data.
Increased mammary cancer susceptibility resulting from these knockout TSGs to mammary carcinogenesis is not obvious. For example, p53+/-ATM+/- mice have one deleted ATM allele, whereas p53+/-BRCA1-/- mice have two BRCA1 alleles deleted. Although, the tumor incidence data shown in Fig. 2 shows a pronounced difference in susceptibility, one dataset results from a one allele knockout whereas the other results from a two allele knockout. Since the genetic network simulation is trained with both data sets, a comparison between ATM and BRCA1 knockouts is quantified.
Neural network implementation
A neural network design is used to simulate the G2/M genetic network for several reasons. Neural networks are an established computational paradigm [McCulloch and Pitts, 1943; Miller et al., 1995; Fausett, 1994 ]. Neural networks are functionally similar to genetic networks because they take some input, interpret the input, and produce an appropriate output. Neural networks are structurally similar to genetic networks as they consist of a group of nodes with weighted interactions.
Neural networks have been used to model protein/genetic networks. Jiri Vohradsky published a neural network model of genetic network control for lysis/lysogeny in λ bacteriophage. Vohradsky's model was able to reproduce experimental data for the six gene network and elicit a realistic phage transition from a lysogenic state to a lytic state [Vohradsky, 2001]. Welch et al. used a neural network to model flowering time control in Arabidopsis thaliana. An eight gene network was trained to realistically react to external environmental factors. [Welch et al., 2003].
![]() | (1) |
For this implementation, each node output (outputn) results from the sum of input signals (inputi) multiplied by a weight (wti). Weights are positive for activation and negative for inhibition. A minimal weight (wtmin) provides node output when all input signals are zero. Resulting output becomes the input for following nodes in the network.
Fig. 3 shows a simple three node example. Node C's output (outputC) is dependent on inputs from nodes A and B. Note that C's input signals are dependent on the output of nodes A and B.
Equation 1 is modified to handle diploidy knockout genotypes and mutation. In Equation 2, a knockout variable (knock) is toggled from one to zero for allele deletion. A cumulative mutation variable (mut) decreasing from one to zero represents an increasing chance of genetic dysfunction over time. Two copies (j) of mutation and knockout variables represent two allele copies of the gene. Degradation (mut) and dual alleles (j) are not present in the Vodhradsky and and Welch et al. models. These extensions (mut and j) are novel to a neural network implementation of a genetic network.
![]() | (2) |
An activation function, f(x), is applied to each node resulting in non-linear sigmoidal behavior associated with neural networks. For computational tractability, we use a simple linear activation function over a bounded region to simulate a non-linear response.
![]() | (3) |
Knockout training
Without a TSG knockout, the mice remain tumor free. Whereas a knockout genotype, such as p53+/-BRCA1-/-, results in increased tumor incidence. The difference is one copy of the p53 allele and both copies of the BRCA1 allele. All other TSGs are intact; the resulting susceptibility can be attributed to the remaining intact genome. Tumor incidence data from knockout mice provide a metric for the remaining genome's susceptibility to mammary cancer.
Each knockout produces data complementary to the specific knockout gene(s). Figs. 4 (B) and 4 (C) illustrate the overlap of data for three possible TSG knockout genotypes. Overlapping gene functionality is used to apply the knockout tumor incidence data to the G2/M genetic network.
Although genetic networks, such as G2/M, are the subject of ongoing research, here we focus on assigning quantitative values to the topology. We assume that the current understanding [Kastan and Bartek, 2004; Ohi and Gould, 1999; Kohn, 1999; Kanehisa and Goto, 2000] presented in Fig. 5 (left) is correct.
The network's topology is simplified to knockout gene(s) and the complement of the knockout gene(s) to avoid underspecification. Without this simplification, all of the genomic network would need to be simulated and there would be too many attributable genes; interactions would not be accurate or consistent. Network topology is limited to specific knockout genes and "other TSGs" remaining intact that inhibit the G2 & M phases of the cell cycle. As a result, the G2/M genetic network can be simplified to a network (Fig. 5 right) involving p53, ATM, BRCA1, and "other TSGs". Note that activation and inhibition signals shown in Fig. 5 (left) are conserved in Fig. 5 (right).
Training the simulated genetic network involves assigning the correct weights, wti, to each node. Weights of the simulated genetic network are trained using a Breeding Particle Swarm Algorithm (BPSA) described by Settles and Soule [Settles and Soule, 2005]. Candidate weights are given a score value (Eq. 4), so the BPSA can effectively search for optimal weights. A lower score indicates the weights create a better model of the training data. We construct a scoring function from two aspects: the basic functionality of the network, which we call Feasibility, and mouse knockout data, which we call Mouse Simulations.
| score = Feasibility + Mouse Simulations | (4) |
Feasibility
This measurement indicates correct interpretation of input signals. Input signals are classified as pro-growth and anti-growth since the G2/M phase of the cell cycle may be activated or inhibited. A pro-growth signal, such as growth factor signaling, should turn reproduction on and activate the G2 & M phases of the cell cycle. Anti-growth signals, resulting from DNA damage or apoptosis, should prevent a cell from reproducing by inhibiting the G2 & M phases of the cell cycle. The Feasibility measurement maintains basic realistic network functionality described in Tab. 1. At the G2 & M phase of the cell cycle, the cell is committed to division; without pro-growth or anti-growth signals reproduction still occurs [Fairbanks and Andersen, 1999].
| Table 1: | Feasibility scenarios used to judge basic network functionality. 0 = off signal, 1 = on signal. |
| s | inputs | Reproductions | |
| scenario | Pro-Growth | Anti-Growth | Network Output |
| 1 | 0 | 0 | 1 |
| 2 | 0 | 1 | 0 |
| 3 | 1 | 0 | 1 |
| 4 | 1 | 1 | 0 |
Feasibility, used in Equation 4, is described below in Equation 5. Input signals, inputs, for each scenario, s, are applied to the network, network(), and resulting network output, network(inputs), is subtracted from the expected network output, Reproductions, for that scenario.
![]() | (5) |
Mouse simulations
To reflect mouse knockout data from the literature, TSGs in the network are disabled using knock variables (from Eq. 2). Cumulative degradation (mut) causes each gene (n) to have an increasing chance of mis-interpreting inputs and dysfunctioning. A distribution with a non-zero mean is used because mutations are rarely beneficial and more often have a loss-of-function effect on the gene product [Fairbanks and Andersen, 1999]. The cumulative normal distribution in Equation 6 is used to model an increasing chance of a gene acquiring mutation(s) which cause it to dysfunction. Varying mean and variance of the normal distribution from 0.02 and 0.01 respectively, still allows the simulation to be trained-these values are arbitrary.
| mutn,j = mutn,j - Norm(0.02, 0.01) | (6) |
Each mouse simulation accumulates mutation at 26 discrete time points--analogous to somatic mutations accumulating over the 26 months the in vivo knockout mice were observed. The mutated network has a cancerous state when the Network Output from Tab. 1 is greater for all scenarios than the normal Network Output defined in Tab. 1 (i. e. Reproduction is turned on when it shouldn't be). If the mutated network's output indicates a cancerous state, the time point is recorded. This time point is analogous to time-to-tumor data of a laboratory knockout mouse.
![]() | (7) |
100 instances of the network simulate a population of 100 mice. Equation 7 shows how in vitro cumulative tumor incidence is compared to in vivo published cumulative incidence using sum of squares difference at each time point, t. The BPSA optimizes network weights to minimize the difference between the simulated and in vivo tumor incidence for each of the 7 genotypes shown in Fig. 2.
Fig. 6 shows an example of a well-trained network fitting data from three knockout genotypes [Umesako et al., 2004; Xu et al., 2001]. The simulated and in vivo p53+/-BRCA1-/- knockout genotype results in earlier and more severe tumorous growth. Whereas the simulated and in vivo p53+/-ATM+/- knockout genotype results in tumor onset in older mice with less frequency. Finally, the p53+/- knockout genotype results in fewer tumorous mice with the least frequency.
360 networks were trained; each network was trained using the same knockout data. BPSA training lasted for 10,500 generations taking approximately 850 minutes on a 108 node Beowulf cluster allowing 265 simultaneous runs. Since the BPSA is a stochastic training technique, score values varied with an average of 0.05380. No zero weights were assigned in any of the solutions; all paths in the network were used. Removing 1/3 of the least fit runs resulted in an average score of 0.04493 for the remaining 240 networks and prevented unrealistic networks from possibly corrupting the gene network analyses.
I/O Contribution
Given the knockout mouse data used to train the genetic network simulation, which TSGs were most relevant to mammary cancer susceptibility? I/O Contribution measures a gene's importance to the network's output. I/O Contribution is measured by knocking out both copies of gene (knockoutn) and comparing network output to the output of the intact network (intactn). Mutation (mut) is fixed to .01 so that the activation function (Equation 3) does not clip signals resulting from larger weights. Network input signals are set to allow each TSG to be potentially activated; pro-growth is turned off (0) and anti-growth turned on (1). The change in network output due to gene n is specified as ion in Equation 8. Nodes with higher ion scores have a higher contribution to the correct functionality of the network.
| ion = | network(knockoutn) - network(intactn)| | (8) |
Fig. 7 shows average ion scores for 240 networks trained with Feasibility and Mouse Simulations and 240 control networks trained only with Feasibility. p53 has the greatest effect on genetic network output and BRCA1 has the least effect. Note the overlapping error bars of the io scores of ATM and p53 in control networks. However, genetic network simulations trained with knockout mouse data show ion scores differ significantly for the 95% confidence interval. Signal from the knockout mouse simulation training results in an ion score ordering of p53 > ATM > BRCA1. This ordering is consistent with our hypothesis mentioned in the introduction.
Robustness
Robustness, in this genetic network simulation, measures the ability of a gene (in the network) to function despite an accumulating chance of acquiring a dysfunctional mutations described in Equation 6. A score for robustness (robust) is attained by adding together all input weights (wti) for node (n). Equation 2 shows as mutn decreases with time, larger weights are necessary to prevent gene n from mis-reading input signals (input). Smaller weights are assigned to genes more sensitive to mutn and more likely to dysfunction. A larger robust score indicates the gene is more resistant to dysfunction.
![]() | (9) |
Fig. 8 shows average robust scores for 240 trained networks (left) and 240 control networks (right). The BRCA1 node is assigned larger input weights resulting in a higher average robust score; the genetic network simulation trained with knockout mouse data indicates that BRCA1 is more resilient to dysfunction. ATM is assigned a much lower average score-indicating that it is more likely to dysfunction and result in mammary carcinogenesis.
Note the overlap of error bars and differences in magnitude of the control networks trained without knockout data. Markedly different behavior between trained and control robustness scoring indicates that training with tumor incidence data is informing the genetic network simulation.
We have presented a genetic network simulation which is trained to reproduce in vivo knockout mouse tumor incidence data. The simulation reproduces qualitative mammary cancer susceptibility resulting from TSG knockouts. Since mammary cancer susceptibility of a knockout genotype results from the remaining intact genome, multiple TSG knockouts provide overlapping data regarding the remaining intact TSGs. We leverage this overlap to train our neural network similar to the way simultaneous equations inform the values of the unknowns. At the same time this allows us to generalize the results from multiple knockout experiments.
Trained with in vivo data, this genetic network simulation allows a comparison between TSG knockout genes: ATM, BRCA1 and p53. This is not trivial since knockout experiments often focus on a particular gene and are not directly comparable due to differences in the number of knockout alleles and background mutations (such a p53+/-). This model assumes experimental and strain susceptibility differences are negligible and we acknowledge that this may not always be the case. Even with these differences, the trained simulation is able to use the tumor incidence data from multiple experiments to produce validated results. The qualitative ordering of p53 > ATM > BRCA1 is consistent with mouse mammary cancer literature.
I/O Contribution analysis indicates that p53 is most relevant to inhibiting the cell cycle whereas BRCA1 is the less important to mouse mammary cancer susceptibility. This simulation provides a quantitative model for increased mammary cancer susceptibility due to TSG dysfunction.
Since the simulation training involves stochastic gene dysfunction, a measure for genetic robustness is presented. It is predicted that the BRCA1 TSG is most resistant to dysfunction over the lifetime of Mus musculus. ATM has the lowest Robust score indicating that it is more likely to dysfunction. We acknowledge that carcinogenesis may result from other forms of genetic dysfunction, our model simulates genetic susceptibility to carcinogenesis in the context of somatic mutations.
It is also necessary to mention that I/O Contribution and Robust scores may be effected by network topology. An interesting future experiment would be to examine the effects different network topologies (pathways) have on these results. Certainly, biases caused by topology effect in vivo knockout behavior. If the assumed network topology is correct, then the effect of network topology on our in silico knockouts informs the simulation and the results.
Researchers need a preliminary understanding of which genes to target before investing valuable time and resources. This paper presents a general protocol for merging data from different experiments that is scalable to include other knockout genes and phenotype data. Although there is no substitute for molecular bioscientific investigation, this genetic network simulation can be used as a guide to focus resources regarding cancer genetics and susceptibility. It is also our hope that this genetic network simulation might be extended with data from knockout experiments involving other TSGs to quantitatively compare susceptibilities with p53, BRCA1, and ATM.
This work is supported by NIH COBRE Grant P20 RR15587, NIH INBRE Grant P20 RR16448-01, NIH R01 CA104470 and NSF Grant EPS80935.