In Silico Biology 2, 0013 (2002); ©2002, Bioinformation Systems e.V.  
Dagstuhl Seminar "Functional Genomics"



The GP problem: Quantifying gene-to-phenotype relationships

Mark Cooper1,2,*, Scott C. Chapman3, Dean W. Podlich1,2 and Graeme L. Hammer1,4



1 School of Land and Food Sciences, The University of Queensland,
Brisbane, Queensland 4072, Australia

2 Current Address: Pioneer Hi-Bred International Inc.,
7300 N.W. 62nd Avenue, P.O. Box 1004, Johnston, Iowa 50131, USA

3 CSIRO Plant Industry,
120 Meiers Road, Indooroopilly, Queensland 4068, Australia

4 Agricultural and Production Systems Research Unit (APSRU),
Queensland Department of Primary Industries, Tor Street, Toowoomba, Queensland, Australia

*Corresponding author:
Email: mark.cooper@pioneer.com





Edited by E. Wingender; received September 26, 2001; revised and accepted December 21, 2001; published January 24, 2002


Abstract

In this paper we refer to the gene-to-phenotype modeling challenge as the GP problem. Integrating information across levels of organization within a genotype-environment system is a major challenge in computational biology. However, resolving the GP problem is a fundamental requirement if we are to understand and predict phenotypes given knowledge of the genome and model dynamic properties of biological systems. Organisms are consequences of this integration, and it is a major property of biological systems that underlies the responses we observe. We discuss the E(NK) model as a framework for investigation of the GP problem and the prediction of system properties at different levels of organization. We apply this quantitative framework to an investigation of the processes involved in genetic improvement of plants for agriculture. In our analysis, N genes determine the genetic variation for a set of traits that are responsible for plant adaptation to E environment-types within a target population of environments. The N genes can interact in epistatic NK gene-networks through the way that they influence plant growth and development processes within a dynamic crop growth model. We use a sorghum crop growth model, available within the APSIM agricultural production systems simulation model, to integrate the gene-environment interactions that occur during growth and development and to predict genotype-to-phenotype relationships for a given E(NK) model. Directional selection is then applied to the population of genotypes, based on their predicted phenotypes, to simulate the dynamic aspects of genetic improvement by a plant-breeding program. The outcomes of the simulated breeding are evaluated across cycles of selection in terms of the changes in allele frequencies for the N genes and the genotypic and phenotypic values of the populations of genotypes.

Links: http://pig.ag.uq.edu.au/qu-gene/

http://www.apsru.gov.au/Products/apsim.htm

Keywords: : E(NK) model, epistasis, genotype-by-environment interactions, plant, crop, target population of environments, genetic space


Introduction

Today, a major research focus in the field of genetics and computational biology is developing methods to predict properties of organisms and populations of organisms at the phenotypic level from knowledge of the structure, function and diversity of genomes. We refer to the problem of determining gene-to-phenotype relationships as the GP problem. Formulating a solution to this problem for a defined natural system requires integration of information across many levels of organization within biophysical systems. An iterative modeling approach combined with strategic experimentation provides a powerful framework for tackling the GP problem. The objective of this paper is to define what we mean by an iterative modeling approach. We commence by describing a general approach to modeling natural systems and then illustrate its application in the modeling of plant breeding programs in an agricultural context.

It has often been stated that a model is a simplification of the natural system under investigation and that the level of simplification must be balanced against the complexity of the properties of the system to be studied. Therefore, what is an appropriate modeling approach to tackle the GP problem? In describing modeling strategies in general, Rosen [1985] and Casti [1989, 1997] distinguished between the "Natural System" that we are attempting to understand and the "Formal System" that is a mathematical construction of how we understand the properties of the natural system (Fig. 1). This is a useful starting point for thinking about how we might model a genotype-environment system.
Figure 1: Concept map for modeling biophysical systems. The "Natural System" is the biophysical structure that is under investigation and the "Formal System" is the model based investigative strategy that is in use to construct "Knowledge Structures" that represent properties of the "Natural System".


Approaches to constructing a formal system that captures the important properties of a natural system can take many forms. Here we are interested in a mathematical framework that allows us to represent the key components of the natural system and define the key relationships between these components. We intend that the mathematical relationships that we construct within the formal system will ultimately be representative of the causal relationships that are properties of the natural system, so that we can investigate the implications of these causal relationships within the formal system. More detailed formalizations might be appropriate if the objective is to investigate relationships at lower levels within the system, rather than to understand the relationships among its components. An example of this is in the modeling of processes related to the productivity of agricultural plants (Fig. 2). The genetics of different species and varieties within species determine how plants interact with the soil and aerial (radiation, temperature, rainfall) environments to develop structures to 'capture' radiation, CO2 and water. While the enzymes and biochemistry of the primary processes involved in the photosynthesis are extensively studied, it is difficult to integrate the results of these processes (assimilation of CO2 into biomass) over various time scales, and model the effect of this biomass as it is diverted to new tissues and organs to either store biomass, or capture more resources. It has been considered that models can only simulate one or two levels of scale away from the level of their primary function. Further, at molecular and atomic scales, the requirement for information input to the system rapidly increases. By studying and experimenting within the natural system we attempt to gain knowledge about the biophysical structures and their causal relationships at appropriate scales. Some of these causal relationships may be functional, others pre-functional and in many cases non-functional but consequential of the ways in which the biophysical structures interact [Kauffman, 2000].
Figure 2: Hierarchy and scale in modeling processes within plant systems


In many cases the results of experiments in biology are summarized descriptively. Alternatively we can attempt to encode, within a formal mathematical framework, our understanding of the results of the experiments (Fig. 1). The collection of these formal mathematical structures that we create is a model of the system. In many cases we may commence an experiment with a prior model or hypothesis and use the results of our experimental program to update and improve our model of the natural system [e.g. Ideker et al., 2001]. As we refine and improve our model through iterative cycles of experimentation and modeling we will be able to study properties of the natural system within the properties of the formal system. This will give us a basis for determining the level of confidence we have in decoding the structures observed within the model and making predictions about the properties that we expect to see within the natural system. Additionally, model building through iteration will enable us to acquire and interpret data structures from experimental programs as a foundation for constructing knowledge structures and queries that apply to the properties of the natural system. As we improve the quality of our model we will increasingly improve our power to predict properties of the natural system across its levels of organization.



Integration by assumption or by folding out the detail?

If we attempt to construct an integrated model of a natural system without adequate attention to the ways that the components of the system interact across levels of organization (Fig. 2) then we are either confining ourselves to working within a level of organization or we will construct a model that has limited power to provide insight into many of the properties of the natural system. In the absence of experimental evidence, attempting to integrate across levels of organization by assuming that interactions are unlikely to be important will leave the resulting model vulnerable to deviate from the natural system whenever these interactions become important.

In classical quantitative genetics many of the complicating interactions that can impact on gene-to-phenotype relationships have been assumed to be unimportant, based on the expectation that their effects are small, and/or that their estimation is impractical. Two properties of genotype-environment systems that are often ignored are those of gene-to-gene interactions (epistasis) and gene-by-environment interactions [e.g. Clark, 2000]. For example, in defining the value of a genotype for a quantitative trait that is determined by multiple genes, the assumption that epistasis is zero implies that the effects of the alleles for the segregating genes are independent of the effects of the alleles at the other genes. In this case, for each gene, additive and dominance intra-gene effects can be defined in terms of contrasts between the homozygous and heterozygous genotypes. Hence, the value of the multi-gene genotype for an individual is then simply determined as the cumulative effects of the genes by summing the allele effects across the segregating genes. Similarly, gene-by-environment (GE) interactions have been assumed to be unimportant or a source of error that can be summed to zero by evaluating genotypes in adequately large samples of experimental environments representing the target population of environments.

Where experimental evidence demonstrates that the interactions are important it is necessary to directly evaluate their implications within the formal system. Analyses of the genetic architecture of quantitative traits in model systems indicate important sources of genetic variation attributed to epistasis and GE interactions [Mackay, 2001]. The same can be expected of economically important traits in agricultural plant species. Therefore, in tackling the GP problem for quantitative traits we seek a modeling framework that enables investigation of the impact of gene-to-gene and gene-by-environment interactions.



Modeling a genotype-environment system

To progress from a general discussion of strategies for modeling natural systems to the specifics required to model genotype-environment systems it is necessary to define both the key properties and relationships that are important in the target natural system and the methods that are to be used in constructing the formal system. Figure 3 is a concept map, based on the modeling framework described in Figure 1, which focuses on the GP problem for a genotype-environment system. Our objective is to establish a formal representation of a genotype-environment system to enable modeling gene-to-phenotype relationships as a basis for evaluating the efficiency of plant breeding strategies [Cooper et al., 1999]. Therefore, here we emphasize the quantification of allelic variation at N genes and their potential interactions within NK gene networks [Kauffman, 1993] and with E environmental conditions [Podlich and Cooper, 1998] in determining the gene-to-phenotype relationships for the traits to be improved by plant breeding.
Figure 3: Concept map for modeling the key components of a genotype-environment system and the relationships to the components of the E(NK) model and the investigative strategies applied to quantify the value of alleles of genes within the genotype-environment system [Adapted from Cooper et al., 1999].


The scope for modeling plant and animal breeding strategies has been a long-term focus of applied quantitative genetics [e.g. Falconer and Mackay, 1996; Comstock, 1996]. The use of computer simulation approaches has increased as hardware and software capability and flexibility have improved. Adopting a simulation approach to study gene-to-phenotype relationships provides greater flexibility for investigating the influences of epistasis and GE interactions than is possible within the classical statistical modeling approach [Kempthorne, 1988; Podlich and Cooper, 1998]. Kauffman [1993] gave a comprehensive discussion of the NK model and its suitability for investigating the impact of epistasis in evolutionary processes. Podlich and Cooper [1998] defined the E(NK) model as an extension of Kauffman's NK model in order to accommodate the effects of gene-by-environment interactions. In the E(NK) model gene-by-environment interactions are possible where different forms of NK gene network models can be expressed in the different environmental conditions that are possible within a target population of environments.

The relationships between the components of the E(NK) model and the biophysical components of a genotype-environment system are indicated in Figure 3. Some of the investigation strategies that can be used to provide the information necessary to build formal models of gene-to-phenotype relationships and quantify the value of allelic variation in terms of the components of the E(NK) framework are indicated. Key activities that are emphasized include: (i) environmental characterization as a basis for defining the target population of environments and causes of GE interactions, (ii) genetic analysis to study genetic variation for biochemical pathways, physiological processes and adaptive traits, (iii) genetic (recombination) and physical mapping of genes, (iv) functional genomics to study the regulation and expression of genes, and (v) crop growth models that define the relationships between genetic variation for traits, plant growth and development processes and variation in environmental resources within a target population of environments [e.g. Bidinger et al., 1996].



Sorghum breeding example: Problem and model definition

To examine the effectiveness of a breeding strategy we need to define two properties of a genotype-environment system: (1) the target population of environments, and (2) the target genotype for the gene-to-phenotype model. Within the target geographical area that a breeding program operates, new genotypes are developed over sequences of cycles of intermating parents, evaluation and selection of progeny to identify new genotypes that have high and stable yield performance across a wide range of environmental conditions. The occurrence of environmental conditions within the geographical area has both spatial and temporal dimensions and the different conditions can occur with different frequencies in both dimensions. This results in a complex mixture of different environmental conditions that is referred to here as the target population of environments. In the presence of GE interactions, understanding the environmental factors that influence genotype performance and cause these interactions is an important step in designing an effective testing strategy for measurement of trait phenotypes as part of a breeding program. The target genotype is then defined as the genotype that results in the best trait performance across the target population of environments for the specified gene-to-phenotype model. For complex genotype-environment systems there can be multiple genotype targets. As E(NK) models become more complex, with increasing levels of E, N and K, it becomes increasingly difficult to compute and identify a single target genotype. In these situations, where it is not possible to create and evaluate all potential genotypes for a gene-to-phenotype model, alternative evaluation strategies are used. In the example we consider here the genotype-environment system is of a size that definition of a single target genotype is possible.

In this example we discuss some key results from a larger long-term study. This larger study is investigating the requirements (Fig. 3) for model development and simulation of sorghum (Sorghum bicolor (L.) Moench) adaptation and grain yield for the heterogeneous dryland agricultural system in northeastern Australia [Chapman et al., 2000a,b,c, 2002a,b].

First we provide some background and context to the complexity of this genotype-environment system. Sorghum is the major summer crop grown in the northeastern cropping region of Australia. Grain yield is the major economic product and is used mainly as animal feed. Sorghum grain yield is a complex quantitative trait and is the result of interactions and integration of many component traits that can themselves interact with variation in environmental conditions (rainfall, temperature and solar radiation) during a crop growth and developmental cycle of around 100 days. The major environmental variable that has a dominant influence on grain yield variation is water availability to the crop. Variation in water availability is a consequence of complex spatial and temporal variation in rainfall prior to and during the growth of the crop and also the spatial variation in the water holding capacity of the soil types across the geographical area. We have found that the environmental variation in incidence of drought can explain a significant component of the GE interactions for grain yield [Chapman et al., 2000a,b,c]. Research into the genetic and physiological bases of drought tolerance of sorghum has identified and examined the importance of the following four traits: (1) phenology, in particular the timing of flowering (PH) [Hammer et al., 1989], (2) stay-green (SG) [Borrell and Hammer, 2000], (3) transpiration efficiency (TE) Hammer et al., 1997; Mortlock and Hammer, 1999), and (4) osmotic adjustment (OA) [Hammer et al., 1999]. In parallel research, genetic analysis and the construction of a molecular marker map for grain sorghum [Tao et al., 1998, 2000] has enabled trait dissection. This body of work provides working hypotheses of the number of genes or Quantitative Trait Loci (QTL) that may contribute to the genetic variation for these four traits [Chapman et al., 2000a,b].

With access to this experimental database we have used a simulation approach to investigate the efficiencies of plant breeding strategies used for genetic improvement of grain yield of sorghum under the dryland conditions in Australia. This required us to develop an interface between a genetic modeling platform (QU-GENE) [Podlich and Cooper, 1998; http://pig.ag.uq.edu.au/qu-gene/] and a cropping system model (APSIM) [McCown et al., 1996; http://www.apsru.gov.au/products/apsim.htm], which has a module for sorghum [Hammer and Muchow, 1994; Hammer et al., 2001]. This interface was constructed in a way that used information generated from our ability to characterize environments for their occurrence of drought, our understanding of the spatial and temporal distributions of drought in the target population of environments, and the data available from genetic and physiological analyses of traits considered to contribute to drought tolerance (Fig. 3). This provides a model architecture that links the alleles of genes and the plant growth and development processes that respond to variation in the environmental conditions to determine grain yield (Fig. 4). Thus, by developing an interface between the QU-GENE genetic model and the APSIM-Sorg model for sorghum there is a relationship between genes and phenotypes that enables investigation of the GP problem within a genotype-environment system context. These gene-to-phenotype relationships can be used to assess the value of genes in terms of an E(NK) model for grain yield in a target population of environments. Further, as additional experimental information becomes available it is possible to continually update the genetic and physiological models for the genotype-environment system, our assessment of the allelic variation we have identified, and any impact that this may have on the efficiency of the breeding strategies we are using for genetic improvement of sorghum.
Figure 4: Schematic of the modular structures and linkages between QU-GENE and APSIM. In this example S1 recurrent selection was used as the breeding strategy to improve grain yield of the sorghum population of genotypes. Other plant breeding strategies are indicated (e.g. pedigree selection). Genotypes are categorized into expression-states in QU-GENE and these expression-states map to trait values modeled in APSIM-Sorg for different combinations of soil and weather data. Output from APSIM is processed to define both the yield of all possible genotypes (expression-state combinations) and the frequency of drought environment types (ETs) encountered in the target population of environments (TPE).

The E(NK) model can be parameterized in a number of ways, including: (1) Constructing Boolean gene networks and sampling genotype values for the components of the networks from underlying distributions of gene effects; a procedure pioneered by Kauffman [1993]; (2) Defining inheritance models using empirical estimates for classical quantitative genetic parameters [Podlich and Cooper, 1998]; and (3) Specifying gene networks to represent the properties of biochemical pathways. For the sorghum genotype-environment system in our example the resulting E(NK) model is a consequence of the number of genes specified to control variation for traits, the number of environment-types identified for the target population of environments and the physiological relationships that determine crop growth and development with the APSIM-Sorg sorghum model. This is a novel approach for determining the parameters for an E(NK) model and it is made feasible by developing Wethe interface between QU-GENE and APSIM (Fig. 4). Here we consider an E(NK) model where the number of environment-types E=3 and the total number of genes N=15. Each of the 15 genes has two alleles segregating within a base population of genotypes. The level of epistasis for grain yield, as defined by the K parameter, is not explicitly defined here and is an emergent property of the extent of trait interconnectedness within the APSIM-Sorg crop growth model.

The three environment-types represent different levels of severity of drought: (1) mild terminal stress, (2) moderate terminal stress, and (3) severe terminal stress. These drought environment types, together with their frequencies of occurrence in the target population of environments, were determined from an analysis of the timing and severity of water deficits during crop growth and development by running the APSIM-Sorg model for a standard genotype with approximately 100 years of weather data across a number of locations in northeastern Australia. The locations represented different soil types from the target geographical area. The APSIM-Sorg simulations were then summarized by cluster analysis to identify the three key drought environment-types (Fig. 4) [Chapman et al., 2000b,c]. While there are three environment-types in the target population of environments, to be concise we will mostly concentrate on only two of these in this paper; (1) the mild-terminal stress environment-type, and (2) the severe terminal stress environment-type. The 15 genes determine the genetic variation for grain yield in the environment-types by specifying the extent of genetic variation for the four traits PH (3 genes), SG (5 genes), TE (5 genes) and OA (2 genes). Thus, the genetic variation for grain yield is an emergent property of the variation for the physiologically defined growth and development processes in the APSIM-Sorg model impacted by the four traits. The process we have used here to specify the genetic variation for grain yield differs from the classical quantitative genetics approach where effects of "yield-genes" are specified in ways that are unrelated to or unconstrained by the biophysical properties of plant growth and development processes. The resultant genetic variation for grain yield in the base population of genotypes is then subjected to a series of recurrent cycles of directional selection for increased levels of grain yield. The breeding strategy we evaluate in this example is S1 recurrent selection [Hallauer and Miranda, 1988] and selection is based on the yield phenotypes of genotypes when they are evaluated in samples of environments taken from the target population of environments.

The genetic changes in the population of genotypes in response to the selection imposed by the breeding strategy are examined in terms of: (1) the changes in frequencies of the alternative alleles for the 15 genes (referred to as changes in gene frequencies) on a trait basis, and (2) the changes in grain yield performance of the genotypes created and selected during the course of the simulation experiment. We examine these changes due to selection at both genetic and phenotypic levels by constructing response surfaces that relate genetic distances between genotypes to the phenotypic values for the four traits PH, SG, TE, OA and also grain yield. Genetic distances are calculated as Hamming Distances, which give a measure of the number of alleles that differ between any pair of genotypes.

For 15 genes, each segregating for two alleles, there are 315 = 14,348,907 possible genotypes from all combinations of alleles. The frequency of occurrence of these genotypes in the reference population is dependent on the gene frequencies for the 15 genes. Running the APSIM-Sorg crop growth model 14,348,907 times for each environmental condition was not feasible. Therefore, in this example we reduced the number of simulations necessary by allocating genotypes to classes based on defining "expression states" for each trait. An expression state was defined for a trait by the total number of + or - alleles summed across the genes influencing the trait, where the + allele increased trait value and the - allele decreased trait value. Adopting this approach, for N genes determining genetic variation for a trait, with two alleles per gene, there are 2N+1 expression states for the trait. For example, for the trait OA with N=2, individuals can have 0, 1, 2, 3 or 4 + alleles, representing the 5 states of expression for OA. There are numbers of genotypes in each of the expression state classes. If we label the two genes A (A,a) and B (B,b) such that the alternative alleles are A(+), a(-) and B(+), b(-) then the genotype membership of the expression state classes are: 0 = aabb; 1 = Aabb, aaBb; 2 = AAbb, AaBb, aaBB; 3 = AABb, AaBB; 4 = AABB. We then divided the range of phenotypic values for the traits into equal increments on a linear scale, with genotype aabb defined as the lowest expression state and AABB the highest expression state for OA. The same process was applied to the other three traits. Following this procedure, we have 5 expression states for OA, 7 expression states for PH, 11 expression states for both SG and TE. With the four traits we have 5×7×11×11 = 4,235 combinations of expression states. Thus, the 14,348,907-dimension genotype space is condensed and mapped onto a 4,235-dimension expression state space. Running 4,235 APSIM-Sorg simulations for the 600 environments used to represent the target population of environments was manageable with our computer cluster [Micallef et al., 2001; http://pig.ag.uq.edu.au/qu-gene] resources. The deterministic relationship between genotypes and trait expression states used in this example is only one of many ways in which a gene-to-phenotype relationship can be constructed within our modeling framework (Figs. 3 and 4).



Sorghum Breeding Example: Results

For the three environment-types the APSIM-Sorg model was used to estimate a grain yield value for each of the 4,235 trait expression states, referred to hereafter as genotype classes. These estimates were averages from ca. 200 runs of the model, using as inputs daily weather data and soils data from location-year combinations chosen to represent the target population of environments. Some appreciation of the genetic variation for yield that exists among the genotype classes for each of the four traits in the mild terminal stress and severe terminal stress environment-types is given in Figure 5. For both environment-types a series of grain yield frequency distributions is shown for each trait. The genotypic classes are ordered on their genetic distance (measured as a Hamming distance) from the allele combination of the target genotype in the target population of environments. As expected lower grain yields are achieved under severe terminal stress (colored red) than in the mild terminal stress (colored blue) environment-type. For any genotype class for the four traits there is considerable genetic variation for grain yield, which results from genotypic variation for the other three traits.
Figure 5: Grain yield distribution of the genotype classes for the Mild Terminal Stress (colored blue) and Severe Terminal Stress (colored red) environment-types, for representations where the genotype classes are distributed according to their genetic distance from the target genotype (based on grain yield) for each of the four traits; (a) Transpiration Efficiency, (b) Osmotic Adjustment, (c) Phenology and (d) Stay-green. The vertical axis indicates the percentage of the 4235-genotype classes present at each yield/Hamming distance combination. The horizontal left axis indicates the level of grain yield (t/ha). The horizontal right axis indicates the number of alleles different from the target genotype in the target population of environments (referred to as Hamming distance).


To evaluate the consequences of the effects of GE interactions between the mild terminal stress and severe terminal stress environment-types at the level of grain yield we need to examine the relationship between grain yield performance in both environment-types. To do this we construct a scatter plot of the yield values in both environment-types (Fig. 6). If there were no GE interactions there would be a perfect correlation of the grain yield values between the two environment-types. From the shape of the distribution of the yield values it can be seen that there are GE interactions and that the genotypes with highest grain yield differ between the two environment-types.
Figure 6: Grain yield values (t/ha) for the 4235-genotype classes in the Mild Terminal Stress and Severe Terminal Stress environment-types for color coded representations of each of the four traits; (a) Transpiration Efficiency, (b) Osmotic Adjustment, (c) Phenology and (d) Stay-green. Genotype classes are color coded according to their genetic distance from the target genotype in the target population of environments (Hamming distance), extending from yellow (all alleles different from the target genotype) to blue (no alleles different from the target genotype).


In Figure 6, each of the 4235 genotype classes is color coded by trait, extending from light (yellow) to dark (blue), to depict for each trait the genetic distance between the genotype class and the target genotype. As the colors get darker the genotypes in the classes have more alleles in common (giving a lower Hamming distance) with the target genotype. For both TE (Fig. 6a) and OA (Fig. 6b), genotypes with high yield in the severe terminal and mild terminal stress environment-types generally have a large proportion of genes in common with genotypes that yield well in the target population of environments. The situation is different for PH (Fig. 6c). For the PH trait, genotypes that have a high yield in the mild terminal stress environment-type have many genes in common with the target genotype, whereas genotypes that have high yield in the severe terminal stress environment-type are genetically distant from the target genotype. Thus, we have strong GE interactions for grain yield that can impact on selection outcomes for the PH trait and yield in the different environment-types and in the target population of environments. For SG (Fig. 6d) there is a strong association between high yield in the mild terminal stress environment-type and having genes in common with the target genotype. However, this relationship is much weaker in the severe terminal stress environment-type, in part because the other traits have a stronger influence on yield in this environment-type.

Since there are strong epistatic and GE interactions for the four traits in determining grain yield in the genotype-environment system represented in this example, it is important to consider the influence of selection environment on the expected changes in the genetic structure of the population. Here we examine genetic responses over recurrent cycles of selection on yield phenotypes in either the severe terminal stress or mild terminal stress environment-types. These responses to selection are examined in terms of changes in the gene frequencies of alleles for increasing levels of trait expression for each trait (Fig. 7) and finally in terms of trajectories through genetic space for yield (Fig. 8).
Figure 7: Change in gene frequency of the + alleles for increasing level of the four traits (TE=Transpiration Efficiency, OA=Osmotic Adjustment, Ph=Phenology, SG=Stay-green) over cycles of selection, when selection is conducted in the Severe Terminal Stress (a) and Mild Terminal Stress (b) environment-types.


Selection for increased grain yield within the severe terminal stress environment-type (Fig. 7a) had the effect of rapidly increasing the frequencies of alleles that enhanced expression of the two traits OA and TE, gradually increasing the frequencies of alleles for enhanced SG, and decreasing the frequencies of alleles for later flowering, thus selecting early flowering genotypes that could developmentally escape from the severe terminal stress conditions. After selection cycles 5 and 6, once the alleles for greater expression of OA and TE were fixed, the rate of increase in frequency of alleles for enhanced levels of SG was greater than in the previous selection cycles. Selection for higher grain yield under the mild terminal stress environment-type (Fig. 7b) resulted in a different pattern of changes in frequencies of alleles to that observed for the severe terminal stress environment-type (Fig. 7a). Under the mild terminal stress environment-type selection for greater yield favored an increase in the frequencies of alleles for higher expression levels of all four traits (Fig. 7b). Thus, in contrast to the severe terminal stress environment-type, where early flowering genotypes were favored, selection in the mild terminal stress environment-type favored late flowering genotypes. Therefore, as we expect in the presence of these interactions, if we plot the trajectories through genetic space followed by the populations over cycles of selection for yield, these trajectories contrast depending on whether we select under a severe terminal stress environment-type (Fig. 8a) or a mild terminal stress environment-type (Fig. 8b).
Figure 8: Grain yield values (t/ha) for the 4235-genotype classes and the average trajectory of a population of genotypes (red line) over cycles of selection, when selection is conducted in the Severe Terminal Stress (a) and Mild Terminal Stress (b) environment-types. Genotype classes are color coded according to their genetic distance from the target genotype in either the Severe Terminal Stress (a) or the Mild Terminal Stress (b) environment-types, extending from yellow (all alleles different from the target genotype) to blue (no alleles different from the target genotype).



Sorghum Breeding Example: Discussion

The purposes for considering the sorghum breeding example we have described in this paper were threefold: (1) to demonstrate some aspects of the approaches we are developing and using to investigate and deal with the GP problem for complex traits in plant breeding applications (Fig. 3), (2) to emphasize the importance that both epistatic and GE interactions can have in gene-to-phenotype relationships, and (3) show how the E(NK) model can be used as a framework for many approaches to investigating the GP problem. An equally valid case study, with availability of a suitable experimental information base, could be the study of human health issues such as heart disease with influences from the genetics of individuals and the lifestyle environment they choose.

To date our investigation of sorghum genetic improvement in Australia has synthesized a large body of information that previously existed as a series of less well connected studies. The modeling framework we now have has highlighted many previously unappreciated implications of interactions between breeding strategies, the genetic architecture of traits and the environments in which we select for higher grain yield. Also, and perhaps most importantly, the results of these studies have provided testable hypotheses and focal points for further experimentation to test our current understanding of the ways in which these traits interact with each other and environmental conditions to determine grain yield. Thus, we are entering another cycle of the iterative modeling approach described in Figure 3.

The GP problem has always and will continue to be a major challenge in biology. With the increasing availability of the complete genome sequences of a number of prokaryotic and eukaryotic organisms, our improving ability to define the locations of genes in these sequences, and our growing knowledge of the functional relationships between these genes and the biochemical and metabolic pathways they influence [Karp, 2001], we are beginning to understand the dynamical nature of the GP problem. We see that an iterative modeling approach, as described in this paper, is a logical quantitative framework for exploring the growing experimental databases and creating knowledge structures for genotype-environment systems (Fig. 1). This provides a foundation for defining priorities in the model development process and in deciding when development of practical applications is feasible. In our case the practical applications we seek are efficient plant breeding strategies that contribute to sustainable agricultural systems.



Acknowledgments

We thank Professor John Casti for his permission to create a modification of his original modeling concept map in Figure 1 and also Research Trends, Trivandrum, India, for permission to reproduce components of Figure 3.



References