In Silico Biology 5, 0011 (2004); ©2004, Bioinformation Systems e.V.  
Dagstuhl Seminar "Integrative Bioinformatics"

Integrating data from biological experiments into metabolic networks with the DBE information system


Ljudmilla Borisjuk, Mohammad-Reza Hajirezaei, Christian Klukas*, Hardy Rolletschek and Falk Schreiber




Institute of Plant Genetics and Crop Plant Research, Corrensstr. 3
D-06466 Gatersleben, Germany



* Corresponding author
   Email: klukas@ipk-gatersleben.de





Edited by E. Wingender; received September 30, 2004; revised and accepted November 02, 2004; published November 20, 2004



Abstract

Modern 'omics'-technologies result in huge amounts of data about life processes. For analysis and data mining purposes this data has to be considered in the context of the underlying biological networks. This work presents an approach for integrating data from biological experiments into metabolic networks by mapping the data onto network elements and visualising the data enriched networks automatically.

This methodology is implemented in DBE, an information system that supports the analysis and visualisation of experimental data in the context of metabolic networks. It consists of five parts: (1) the DBE-Database for consistent data storage, (2) the Excel-Importer application for the data import, (3) the DBE-Website as the interface for the system, (4) the DBE-Pictures application for the up- and download of binary (e. g. image) files, and (5) DBE-Gravisto, a network analysis and graph visualisation system.

The usability of this approach is demonstrated in two examples.

Keywords: metabolic networks, visualisation, metabolic profiling, information system, data integration



Introduction

Metabolic pathways or more generally biological networks provide the basis for every living organism. In order to understand the interactions between different metabolic pathways, biochemical experiments are carried out. These experiments produce a large amount of data such as expression profiles and metabolic time series. Experiments are often repeated multiple times to compare the influence of different growth conditions or genetic modifications, and often many substances are detected at once. For data analysis purposes the biological context (e. g. the metabolic network) needs to be considered.

Methods and tools which assist in the interpretation of experimental data are an important field of development in bioinformatics and several approaches have been proposed. Examples are scatter plots of pairs of experiments [Colantuoni et al., 2002], clustering methods with visualisations of the results [Dysvik and Jonassen, 2001; Fellenberg and Mewes, 1999; Sturn et al., 2002] and mapping of gene expression data onto pathways and their visualisation using graphical attributes (e. g. colour codes) to show the level of gene expression [Chung et al., 2004; Karp et al., 1999; Nakao et al., 1999; Pan et al., 2003; Thimm et al., 2004; Wolf et al., 2000]. Most approaches focus on transcriptomics and proteomics data which is often mapped onto static pictures such as the KEGG pathway diagrams.

In this paper we discuss an approach for the mapping of metabolomics data (e. g. time series data, data of different plant lines) onto metabolic networks. However, our approach can be also used for other types of data (e. g. transcriptomics data) and other biological networks (e. g. protein-protein interaction networks). The mapping of metabolomics data onto metabolic networks and its visualisation in 2½ dimensions has been presented in [Dwyer et al., 2004]. Here we focus on our new information system DBE (Data analysis and visualisation system for Biological Experiments) and on different charting techniques for the visualisation in two dimensions. DBE helps biologists in managing, analysing and visualising their experimental data. In addition to the data mapping the DBE data storage layer makes it possible to store experimental results in a single place and in a consistent form.

This paper is organised as follows. First, we discuss details of the mapping procedure in the section "Mapping of experimental data onto dynamic networks". Secondly, we describe the individual DBE-components in the section "The DBE information system". Thirdly, to demonstrate the utility of our method it is applied to two examples: "Example 1 - comparison of different bean plant lines" and "Example 2 - seed development time series analysis". Finally, we present a summary of our approach and give an outlook in the section "Discussion"



Methods


Mapping of experimental data onto dynamic networks

Biological experiments are carried out in order to gain a deeper understanding of the metabolism of an organism, under changing conditions or while comparing different lines or mutants. Modern techniques like mass spectrometry allow biologists to analyse metabolite concentrations of up to 100-200 substances at once. This helps to gain a more complete view of the changes in the metabolism. But even more important than the number of analysed substances is the knowledge of the underlying processes. The measured metabolites are part of a huge network of metabolic reactions.

This is the foundation for the presented integration method. During the analysis of experimental metabolic data an integrated view of the underlying network and the corresponding measured values needs to be considered. Our approach consists of three parts: (1) dynamic networks, (2) data mapping and (3) charting for the display of the measured values.

(1) Contrary to mapping of data onto static pictures [Thimm et al., 2004; Wolf et al., 2000] we consider dynamic networks. Dynamic networks are networks that are either derived from databases (e. g. Proton/MARGBench [Freier et al., 1999; Freier et al., 2004], KEGG [Kanehisa and Goto, 2000]) or given by the user. They may, for example, change depending on the set of substances analysed in a particular experiment.

(2) Data mapping is the assignment of data from experiments onto network elements. Here we consider the assignment of metabolite measurements to metabolic networks, but in general different experimental data can be mapped onto corresponding network elements of biological networks: protein levels onto protein-protein-interaction networks, transcriptomic data onto gene regulatory networks, and so on. Such a mapping can be carried out automatically if there exists a function which assigns every data value to a network element.

(3) Charting techniques (e. g. bar or line charts) are used to display measured values in the context of the network elements. Metabolite levels are shown inside the corresponding network nodes (metabolites), elucidating the interpretation of the data and the interaction between various metabolic routes.


The DBE information system

The DBE information system consists of a number of components, see Fig. 1: (1) the DBE-Database, (2) the Excel-Importer application which extracts biological experimental data out of Excel-files and stores this data in the database, (3) the DBE-Website as the interface of the system, (4) the DBE-Pictures application, which supports the up- and download of binary (image) files and (5) DBE-Gravisto, a network analysis and visualisation system. The Proton/MARGBench system [Freier et al., 1999; Freier et al., 2004] allows the integration of biological network data from different relational databases (e. g. KEGG, BRENDA). It is not a component of the DBE system but closely coupled to it.



Figure 1: Components of the DBE information system and the external data source Proton/MARGBench.


Although different techniques are used to implement the individual components of the information system, they are all uniformly accessible from the DBE-Website. In the following the different components are described in more detail:

  1. DBE-Database: The focus of this database, implemented as an Oracle 9i database, is the storage of metabolite data connected to biological experiments. The Entity-Relationship-Model [Saake and Heuer, 1997] is shown in Fig. 2. The top entity is the "experiment"-table, which stores information about the import (e. g. time, username, name of experiment). For each experiment a number of "plants" can be stored. Genotype, variety and growth-conditions of these plants are saved in this entity. From these plants biologists take "samples". For each sample a number of "measurements" can be stored in the database. Two entities for user management, the "account" and "usergroup" tables, are used to specify the accessibility of an experiment for different users. Additional reference information can be assigned to the "substance" and "substancegroup" entities. These tables store reference information so that the users use a defined vocabulary for substances and measurement-units.
  2. Excel-Importer: While examining different laboratory PCs it became clear, that these PCs run different Windows versions and special laboratory software that is suited for the special types of analysis apparatus that are in use by the biologists. The common export format from these software packages is Microsoft Excel. A solution that can process Excel files enables the biologists to copy the experimental data tables from the analysis software directly into the import template. A number of additional fields in the template, such as start of the experiment, notes, plant names and growth conditions can be added too. This way it is possible to enter all relevant data at one place into one file.
  3. DBE-Pictures: In order to be able to compare raw data and the corresponding pictures and chromatograms it is of great importance to include all this information in the database. This enables the users to identify known and unknown substances even after a very long period. To manage this type of data the DBE-Pictures application was designed. This application makes it possible to upload and assign image files to experiments, plants and to individual measurements. Additional commands make it possible to remove individual files or all files that are assigned to experiments, plants or measurements. It is also possible to download, save or view uploaded images and binary files. Therefore the user can also store experiment related files (e. g. documentation) in one place, which makes it easy to share experiment related data.
  4. DBE-Website: The web interface makes it possible to access the different components of the information system. It can be used to initiate the import of experimental data into the database, to do basic data retrieval tasks, and to manage the experimental data stored in the database. The DBE-Pictures and DBE-Gravisto applications can be started directly from the DBE website by using Java Web Start [Marinilli, 2001].
  5. DBE-Gravisto: This system is based on Gravisto [Bachmaier et al., 2004], an extensible graph-library and -editor. We developed several Gravisto-plugins (application extensions) to access the DBE-Database, to map experimental data onto given networks, to visualise the data-enriched networks, and to perform network analysis tasks. Visualisations created with DBE-Gravisto can be exported into standard graphics formats such as JPG, PNG, SVG or PDF. Examples for such visualisations are shown in the section "Example 1 - comparison of different bean plant lines" and the section "Example 2 - seed development time series analysis". The visualisation can be enhanced by using different levels of detail. A simple drawing of the chart without labels or captions is well suited for a larger view of the biological network. For a high zoom level where only few network elements are visible, more details (captions, legend and label) can be shown.



Figure 2: Entity-Relationship-Model of the DBE-Database.




Results


Example 1 - comparison of different bean plant lines

To demonstrate the utility of our integrative bioinformatics approach we used metabolite data from the seed development of beans (Vicia narbonensis). Beans and other legume species are an economically important plant-derived protein source in the worldwide feed and food industry. In this case transgenic technology was used to increase protein accumulation by introducing the bacterial enzyme PEPC into the seed. The enzyme re-fixes HCO3- deliberated by respiration and together with phosphoenolpyruvate yields oxaloacetate that can either be converted to aspartate or into malate and other intermediates of the citric acid cycle. PEPC controls the anaplerotic carbon flow and may improve seed carbon economy [Golombek et al., 1999].

Analysis of mature seeds revealed that transgenic seeds have a significant increase in crude protein content up to 20% per gram and a higher dry weight. Combining both effects reveals that protein content per seed increases by 40 to 50% [Rolletschek et al., 2004a]. Tracer experiments could further show a clear stimulation of both [14C]-CO2 uptake and incorporation into proteins. This corresponds to higher in vivo fluxes via the PEPC-catalysed pathway. Because the transgenic effect appeared with different intensities and varied in many transgenic lines, visualisation of multiple changes for a snap overview was needed to recognise general tendencies.

To characterise the responsible metabolic shift within seeds from sugars/starch into organic acids/amino acids/proteins, the metabolite pattern for glycolysis, citrate cycle as well as related sugars and free amino acids was analysed. Metabolites were measured by liquid chromatography coupled to mass spectrometry (LC-MS). This technique allowed the separation according to retention times and molecular masses, and enabled parallel quantitative determinations with very low detection limits (subpicomolar range). A detailed description of this metabolite profiling technique can be found in [Rolletschek et al., 2004a].

Visualisation of metabolites within their pathways (Fig. 3) gives an immediate overview of specific changes in metabolism within transgenic seeds. There was a clear trend towards the decrease of sucrose and phosphorylated sugars of the glycolytic pathway (Glucose-6-P, Glucose-1-P, Fructose-1,6-diP), but increases in the pool size of certain free amino acids due to transgene expression. Concentration of Acetyl-CoA was significantly higher in all transgenic lines as well as an overall trend towards higher levels of intermediates of the citric acid cycle. Thus, the PEPC expression in Vicia narbonensis seeds leads to changes in the metabolite pattern, indicating a shift of metabolic fluxes from sugars/starch into organic acids/amino acids/proteins.



Figure 3: Visualisation of experimental data in the context of a metabolic network: Relative substance levels of different Vicia narbonensis lines (wild type in dark-grey, transgenic lines in light-grey), mapped onto the glycolysis and the citric acid cycle.


The metabolite profiling approach combined with bioinformatics tools and visualisation techniques used here enables the identification of the effects of transgene expression on plant metabolism in a fast and efficient way. In certain types of experiments it may help scientists to find new targets for transgenic invasions.


Example 2 - seed development time series analysis

In this example we investigated the metabolite pattern of growing barley caryopses (Hordeum vulgare). The agronomical importance of cereal seeds is principally based on their accumulation of storage products, mainly starch and proteins. Despite extensive studies on the structure, biochemistry and genetics of developing grains [Bewley and Black, 1994; Duffus and Cochrane, 1982; Olsen et al., 1992] the regulatory mechanisms underlying their high storage capacity are largely unknown. During their growth, caryopses undergo distinct differentiation events. These in turn are reflected in changes of the metabolic state and biosynthetic fluxes. To investigate their specific temporal patterns, time series analyses of metabolites are required. Seed development includes the pre-storage, intermediate and storage phase. Within the pre-storage phase caryopsis consists mainly of pericarp tissue, embedding the liquid endosperm. Increase in the fresh weight and starch accumulation is low. The subsequent intermediate phase begins after endosperm cellularisation at 4-5 days post anthesis (DPA) and proceeds with the differentiation of endosperm tissues. Starch accumulation starts, although with low synthesis rates. The endosperm enlarges, becoming the main storage organ of cereal seeds. During the main storage phase (from 10-11 DPA onwards), the high starch synthesis rate is evident (Fig. 4). Caryopses were harvested every 2 days over a growth period of about 20 DPA. Dynamic changes of about 70 metabolites were characterised.



Figure 4: Structural changes of growing barley caryopsis and localisation of starch accumulation during pre-storage-, intermediate- and main storage stage of development. Starch deposition is visualised within the cross sections through caryopsis (shown in dark colour, after iodine staining, upper panel) and tissues structures shown in dark-field images (lower panel): 1-pericarp, 2-endosperm.


A typical example visualisation of time series data is given in Fig. 5. In this case we used two line charts for the display of the time series data, the chart at the top of each network element shows the development of the metabolite concentrations for samples taken at day, the chart below shows the experimental data from samples taken at night. Such representation allows observing not only developmental changes, but also metabolic responses on day/night conditions. Effect of light and darkness on accumulation of storage products and metabolic fluxes is recently under investigation [Rolletschek et al., 2004b]. It is shown, that the amplitude of this response changes during development and visualisation of these changes in developmental scale is of big importance.



Figure 5: Visualisation of experimental data in the context of an metabolic network: Relative substance levels of Hordeum vulgare seeds sampled during day (top diagram inside the network elements) and night (bottom diagram), respectively. Such a visualisation technique could be also used for the comparison of time series data in different plant lines.



Discussion

We presented an approach for integrating data of biological experiments into metabolic networks by mapping the data onto network elements and visualising the data enriched networks automatically. The developed information system allows the user to store the results of biochemical experiments and digital images of plants, chromatograms and experiment related binary files consistently in one place. Because of the built-in user-management and access-system biologists can easily share their work results (measured values and visualisations) within their group, between different departments or even with the public.

Our approach has already proved its usefulness as biologists use the system to support their scientific work, as shown in the application examples. Different charting techniques are useful in various applications. In the first example a condensed bar chart is used, where each displayed data point is based on a number of repeated measurements. The standard error of these measurements, represented by a line of variable length, makes it easier to estimate the relevance of differences. A future task is the integration of statistical methods to allow a more comprehensive data analysis. In the second example line charts are well suited for the display of time series data. The stacking of different result sets (here day and night), gives an immediate overview of the data. Because of high quality output of the visualisations and the export functionality (graph file export in GML format, and image export in JPG, PNG, SVG and PDF format) the system is also in use for presentation purposes such as the creation of images for posters or papers.

In the future we plan to develop network-search and filter algorithms, which allow the user to analyse and visualise parts of metabolic pathways that are of interest or for which experimental data is available. Along with that we plan to develop interactive network layout and navigation methods.

The DBE information system has significant potential as a powerful tool in experimental biology and biotechnology. Its visualisation and modelling of metabolic pathways allows better understanding of the systems and consequences of experimental manipulation. This leads to more efficient, targeted and successful experimental design, and promotes better achievement of biological and biotechnological goals.

Taken together we present a tool combining bioinformatics and biochemistry in order to facilitate for biochemists the storage, management and visualisation of all processed results. Further work is still needed to find out whether it might be possible to make predictions about any interaction between metabolite channelling through various compartments and how an efficient modification of a pathway can be prepared to increase and/or decrease an end product.

Currently the DBE information system is already in use by scientists at the IPK. After implementation of the discussed extensions the components of the system will be available for public users.



Acknowledgements

We would like to thank Prof. Franz J. Brandenburg, Michael Forster, Andreas Pick, and Paul Holleis (all University of Passau) for excellent cooperation and for granting usage of Gravisto; Prof. Ralf Hofestädt and Andreas Freier (Bielefeld University) for fruitful cooperation and permission to use the PROTON/MARGBench system. For helpful discussion and support we thank Prof. Ulrich Wobus and Prof. Uwe Sonnewald (IPK Gatersleben). This work was supported by the German Ministry of Education and Research (BMBF) under grant 0312706A. We acknowledge funding by the Land Sachsen-Anhalt (MK-LSA 0031KL/1002L).




References