Only recently, and unnoticed by the majority of the scientific community the full impact of the genome projects in toto has become obvious. Whereas most researchers - and rightly so - are mainly concerned with the new possibilities to learn about genetically-caused diseases, about the transcription process, to understand the details of the organisation of genetic information the real new holy grail of research in the 21st century slowly comes into focus, so far mainly for the scientists in the field of bioinformatics: The simulation of whole cells, whole cell clusters, whole organs, whole organisms.
The direction is obvious, but the paths and necessary first steps have to be decided upon now. In order to simulate cells one has to (1) know the molecules involved, (2) understand the interactions between them on an atomic level (e.g. enzyme/substrate, DNA/protein, structural proteins, etc., etc.), (3) have the information on their function on a phenomenological level (metabolic paths, regulatory information transmission, etc.)
An urgently needed prerequisite for this whole new scientific approach, characterized by a holistic view on the biochemistry of the cell, is the existence of suitable information systems that deliver the parameters for the simulations and include the information to reconstructs paths and networks. This will create a totally new way of biochemical research. The traditional identification and characterization of the respective biochemist's pet (may it be a small natural compound, a protein, the ribosome, etc.), taking place in vitro on a singular item of his/her interest the future research requires (1) standards, (2) parallelization, (3) in vivo information. The tools required have been and are being developed by people working in the field of proteomics, nanotechnology, etc. (DNA chips, fluorescence techniques, etc.). The task is to collect the data for the holistic description of a cell function on different levels of detail.
In addition to the developments of the necessary algorithms and tools bioinformatics will have the task to drive experimental research. This includes - in addition to the presently discussed areas of structural and functional genomics, protein and RNA structure prediction, molecular docking, the identification of targets for biochemical research (which proteins have to be knocked-out or modified in cells or cell cultures, which results of this process have to be measured (changes in expression of any number of proteins, changes in the metabolism, etc.).
What has become obvious now is that the same processes of information collection and verification that have been and are reasonably successful applied in the genome sequencing projects have to be established for the coming functional and cell-characterization data, this being much more difficult than for sequences. Whereas a given sequence is either correct or incorrect, functional information can be correct and still be meaningless. What is worse, it can be correct, meaningful and still be not suitable for the simulations.
Although this is evident for all classes of proteins it becomes especially complicated in the field of enzyme information. For the metabolic part of the future cell information system BRENDA represents the most comprehensive data system available world-wide. Data collection and data verifiation including integration and comparison of different organisms is here especially complicated despite the fact that one century of enzyme research has delivered a wealth of - unfortunately widely scattered and unsystematic - data.
The field of metabolic information will be used as an example to state future necessities.