In Silico Biology 7 S1, 08 (2007); ©2007, Bioinformation Systems e.V.  

Workshop "Storage and Annotation of Reaction Kinetics Data"
May 2007, Heidelberg, Germany


Usage of reaction kinetics data stored in databases - a modeler's point of view


Ursula Kummer




Faculty of Biosciences, BIOQUANT, INF 267, 69120 Heidelberg and
Bioinformatics and Computational Biochemistry Group, EML Research, Schloss- Wolfsbrunnenweg 33, 69118 Heidelberg, Germany

Email: ursula.kummer@eml-r.villa-bosch.de





Edited by I. Rojas and U. Wittig (guest editors); received and accepted March 21, 2007; published March 29, 2007



Abstract

Computational approaches to biochemistry like modeling and simulation are dependent on the availability of kinetic information. This information can either be directly derived from experimental data generated by collaborators or has to be digged up from literature, often both. More recently, data stored in databases has started to be a valuable addition as a source of enzyme kinetic data. In order to faciliate modeling and simulation, various tools have been developed in recent years. However, automatizing steps in setting up, analyzing or simulating models requires the data to be in defined formats. Crucial points are addressed below.

Keywords: modeling, simulation, enzyme kinetics, database use



Introduction

One of the most common approaches to modeling a biochemical system is to derive the equations for changes in the concentrations of all species by adding up all reaction velocities that lead to the formation or the consumption of the individual species. If this is not solely done on a phenomenological basis, it is necessary to know about the reaction kinetics in detail and also to know the respective kinetic parameters to some extent.

An example for a small system consisting of only one reaction plus influx and outflux terms is the following:

→ glc + atp → glc6p + adp →

represented by the following systems equations:

(1)

(2)

Here, for sake of simplicity the reaction velocity for the phosphorylation of glucose (glc) to glucose-6-phosphate (glc6p) has been assumed to follow a simple Michaelis- Menten-kinetics and ATP to be constant.

Obviously, the information if such a kinetic term applies and the parameters are either derived from experiments of collaborators or from literature. More recently, also electronic databases are used as a source of information. The way kinetic data has been published in the past is often lacking information crucial for the setup of models. This has been recognized recently and discussed at ESCEC meetings and recommendations have been published by the STRENDA commission (http://www.strenda.org/documents.html). We participated in these discussions and in our publication in the proceedings of this meeting [1] discussed some specific problems for modelers with published enzyme kinetic data.

These problems are equally important when dealing with electronic databases and are therefore shortly repeated below. In addition, the obvious wish to facilitate automating some steps in the setup of models and their analysis leads to additional problems addressed below. All of these problems are frequently encountered in our daily work of developing systems for the support of model setup and simulation, as well as when building models for specific biochemical problems ourselves. We believe that all modelers will face these problems in their daily lives and hope that the discussion below will make experimentalists and groups storing kinetic data aware of them, so that in the future kinetic data is appropriately reported in scientific publications and databases.



General problems with stored data


Kinetic equation

The vast majority of kinetic data published in the literature comprises Vmax and KM values or other individual rate constants. However, this is only part of the necessary information for modeling the respective system. In many cases the actual kinetic equation which is assumed or even was used to derive the published parameter (often by fitting to the equation) is missing. Without this crucial information, the value of publishing the actual parameter is greatly diminished. It also does not help too much if authors mention the name of the corresponding rate-law in the text, as e. g. BiBi- PingPong, etc.; since these terms are not used in an unambiguous way and therefore can be very misleading. What is actually needed is the explicit notation of the respective equation - nothing else. This would make sure that modelers do not have to guess which equation to use. In addition, wrong use, e. g. using a parameter with the wrong rate law would be avoided. A simple example where a slight difference between two kinetic equations with the same constants leads to significantly different results in a model has been reported in [1].


Vmax

Another apparent (and recognized) problem is the publication of the Vmax values. Since most studies are done in vitro the enzyme concentration contained in the Vmax is the one in the test tube. However, modelers are usually interested in the enzyme concentration in the living cell instead. Even though the enzyme of interest has been isolated from cellular material in most cases, there is often not even an estimate about the amount present in the respective life material. An estimation of the original amount is often also not possible by calculating backwards since the results of the purification steps are not reported in sufficient detail.

In addition, instead of simply reporting the components of Vmax, namely the enzyme concentration and the rate constant, many authors hamper the calculation of the individual rate constant by not explicitly writing down the respective enzyme concentration in the test tube, but rather giving the activity of the enzyme without giving amounts etc. (see unit notation below).

All in all, this effectively turns Vmax into an unknown variable in most cases, introducing a lot of fuzziness into the system. Of course, in many if not most cases, there can be no exact quantization of the enzyme of interest in a specific cell type. This implies that parameter estimation techniques have to be used at some point in time. However, this procedure is obviously more reliable and much faster if the initial values are good guesses. These estimates could be very well provided in the primary literature.


Coherent unit notation

Most of the problems with unit notations are associated with the notation of enzymatic activities and concentrations. It is still common to use units like e. g. "activity per mg freshweight". However, as pointed out above, reuse of the respective kinetic data demands the computation of the enzyme concentration in the assay. In order to do so, one has to gather all information available in the text or experiment report (if at all possible) about molecular weight, purity etc. This can be quite cumbersome and is probably done multiple times by different people in the community. Instead, it will be much easier if authors do this right away and provide the respective information in the original text.


Reversible rate laws

The notation of reversible rate laws are another, however rarely severe problem. Reversible rate laws do not pose any problem when models are written down using ordinary differential equations (ODEs). Forward and backward flows of a reversible reaction can cancel each other out so that the overall rate can be given as a single expression. Depending on the concentrations of the substrates and products the rate can be positive or negative, it is zero if the reaction is in equilibrium. However, when modeling biochemical systems containing only relatively low numbers of the participating compounds, e. g. because of volume limitations (e. g. in vesicles) or because of functional necessity (e. g. signalling), we often have to refer to stochastic methods on discrete particle basis [2]. In the stochastic modeling and simulation framework each reaction is characterized by a reaction probability (instead of a reaction rate). A stochastic simulation works as follows: first the probabilities of all reactions are calculated. These depend on the concentrations of the species that take part in the reactions. Then, taking into account the probabilities of all the reactions, it is determined which reaction will take place next and at which point of time this will happen. This is done by drawing random numbers from a random number generator. The chosen reaction is then "executed" by increasing the particle number of the corresponding product species and decreasing the particle numbers of the substrates. This corresponds to the simulation of a single reaction step,subsequent steps require the repetition of whole process.

This stochastic simulation process ensures that the effects of discreteness (the fact that particle numbers are always integers) and the effects of stochasticity (the single reaction events happen at random points of time) are considered. Concerning the relation between reaction rates and reaction probabilities it is clear that reaction rates can also be expressed as an average number of reaction events happening in a unit of time. This in turn can easily be translated into a reaction probability. Thus in many cases (and under certain conditions) the traditional rate laws and kinetic parameters can be utilized for stochastic simulations.

A problem occurs, however, if the rate law describes a reversible reaction. Consider e. g. a reversible reaction in equilibrium. The net rate is zero, which means that substrate and product concentrations do not change due to this reaction. It does not matter that in reality many reaction events in both direction take place. In the stochastic simulation however every single (forward and backward) reaction event needs to be simulated. Since the reactions are random, this leads to fluctuations around the equilibrium. For some short time more forward reaction events may happen, and after that more backward reaction events. Only as an average over some time the reaction rate is zero. Therefore, separate rate laws for the forward and backward part of the reactions need to be available. If reversible rate laws are stored in an unambiguous way, it is possible to dismantle those either manually or automatically. However, as shown in [2] this is not always the case. There are quite a number of examples where this will not possible automatically at all and is at least extremely difficult manually.



Specific problems for software tools

Recently, a number of software tools has been developed to make modeling and simulation techniques in biochemistry available to a large community. Thus, e. g., Pedro Mendes' group (VBI) and our group have developed COPASI [3] a user-friendly, platform independent facility to setup models, and to simulate and analyze them.

COPASI and other, similar tools try to support the user when setting up models by automating as many steps as possible. Thus, reaction equations as shown in Fig. 1 entered by the user are automatically translated in the mathematic formalism that either allows the integration and analysis of ODEs or the stochastic simulation as described above.


Figure 1: Graphical user interface of COPASI representing the glycolysis model of Teusink et al. [4].


With SYCAMORE, a tool developed by the groups of Rebecca Wade (EML Research), Isabel Rojas (EML Research) and ours (http://sycamore.eml.org/sycamore), users are further supported in their setting up of models by providing means to access information for the setting up of models directly from databases, especially SABIO-RK [5]. Automating steps in the modeling process has a number of requirement w.r.t. stored data in addition of the above ones.


Compound and reaction identification

The identification of reactions and compounds in information stored in databases is more difficult than might be anticipated. Thus, compound names are often given at different levels of details. Glucose, D-Glucose and alpha-D-Glucose are e. g. commonly used when representing the first step of glycolysis (see Fig. 2). This reflects the fact that these different levels of details are used in the original literature. However, on one hand information should not be neglected and in some cases it might be different information if a reaction with the participation of D-Glucose compared to alpha-D-Glucose is described, since the first representation might indicate the use of a mixture of alpha- and beta-D-Glucose. On the other hand, it is very cumbersome to repeat any search or selection from tables with all different levels of details.


Figure 2: Screenshot of SABIO-RK with a table representing reactions of glycolysis with different entries depending on the different level of detail.


Thus, when setting up a relatively large model, this problems multiplies the invested work. It would be useful if different levels of details are stored in such a way that selecting the highest level of description automatically contains lower levels.

In addition, unambiguous compound and reaction identifiers (e. g. based on ontologies as discussed in other contributions) will aid the process of modelling immensely since they will also allow sophisticated techniques like model merging in the future.


Additional problems with unit notations

Automating extraction of kinetic information from databases poses additional restraints on unit notations. All units of the final model have to be unified. Therefore, often unit transformations are necessary to make units within one model consistent. To the best of my knowledge, this is currently not automatically possible in any simulation software. However, it is also really questionable if this unit conversion should be the task of the respective simulation tool or rather should be done in the database in which kinetic information is stored. The fact that in the future diverse simulation tools will interact with a few databases points to the latter solution since this saves us from a duplication of work. Adjusting stored units to consistent standard units is therefore high on the wish list from a modeler's point of view.



Conclusions

Making use of kinetic data from literature or from databases poses several problems for modelers. The most prevalent are the completeness of the represented information including the kinetic equation, the possibility to dismantle Vmax into its components, the unambiguous and coherent use of units and compound/reaction identifiers and the representation of reaction equations of reversible reactions such that an automated dismantling of the forward and backward reaction is possible. Finally, additional information about typical enzyme and compound concentrations in vivo would strengthen the usefulness of databases for modeling purposes further.



Acknowledgements

I would like to thank the BMBF and the Klaus Tschira Foundation for funding. I am also very grateful to Ralph Gauges, Martin Golebiewski, Stefan Hoops, Renate Kania, Pedro Mendes, Jürgen Pahle, Sven Sahle, Matthias Stein, Stefan Richter, Isabel Rojas, Rebecca Wade, and Andreas Weidemann for fruitful discussions.




References


  1. Kummer, U. and Sahle, S. (2007) Problems of Currently Published Enzyme Kinetic Data for Usage in Modeling and Simulation. In: Experimental Standard Conditions of Enzyme Characterizations, Hicks, M.G. and Kettner, C. (eds.), Beilstein Institute, Frankfurt, in press.

  2. Gillespie, D. T. (1976). A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comp. Phys. 22, 403-434.

  3. Hoops, S., Sahle, S., Gauges, R., Lee, C., Pahle, J., Simus, N., Singhal, M., Xu, L., Mendes, P. and Kummer, U. (2006). COPASI – a COmplex PAthway SImulator. Bioinformatics 22, 3067-3074.

  4. Pritchard, L. and Kell, D. B. (2002). Schemes of flux control in a model of Saccharomyces cerevisiae glycolysis. Eur. J. Biochem. 269, 3894-3904.

  5. Wittig, U., Golebiewski, M., Kania, R., Krebs, O., Mir, S., Weidemann, A., Anstein, S., Saric, J. and Rojas, I. (2006) SABIO-RK: Integration and curation of reaction kinetics data. Lecture Notes in Bioinformatics 4075, 94-103.