CYTOMER®: A database on gene expression sources

X. Chen 1 and E. Wingender




Gesellschaft für Biotechnologische Forschung mbH,
Mascheroder Weg 1
D­38124 Braunschweig
Phone: +49­531­6181 460
Fax: +49­531­6181 266
E­mail: xch@gbf.de
1The National Laboratory of Protein Engineering and Plant Genetic Engineering,
College of Life Sciences, Peking University,
Beijing 100871, P. R.
China






INTRODUCTION

Although only few relatively small genomes have been completely sequenced thus far, the end of the human genome sequencing project is conceptually anticipated by the community in proclaiming the start of the "post­genomic era" or the period of "functional genomics". Systematic elucidation of gene functions requires to link sequence data with information about molecular mechanisms, and also with histological, anatomical and even taxonomical data. As a consequence, even "classsical" branches of biological and medical research gained new interest when linked to genome­based information.


CYTOMER structure

An increasing number of databases stores information about gene expression patterns. These data come from a variety of methodological approaches, classically from Northern blot analysis or in situ hybridizations, more recently from microarray or SAGE approaches. Generally, gene expression patterns assign where and when a certain gene is expressed, i. e. in which organs/tissues or cell types ("expression sources") and at which developmental stages expression occurs. This kind of information, however, is in most cases represented as mere textual hints, perhaps using a controlled vocabulary. More advanced, gene expression patterns of the mouse are given as an comprehensive dictionary of hierarchically organized anatomical terms [Ringwald et al., 1999].

However, to model the complexity of expression sources in terms of spatio­ temporal dependences of macro­ and microscopic objects (from organs to cells), we tried to establish a more elaborate system. For this purpose, we have developed a relational database system which is aiming at providing a comprehensive overview on all expression sources and their developmental stages. The central table of CYTOMER ("Hub'" is a list which links entries of four other tables: T, developmental stages, P, physiological systems, C, cell types, and O, organs/tissues. The Organ table O is in itself hierarchically organized, representing ``primary'' organs such as kidney or liver as well as their substructures.

The Hub table represents anatomical / histological expert knowledge about which cells occur with what kind of function in which organs and at what stages. The database has been populated thus far with data on human expressions sources exclusively, comprising presently 39 stages, 28 physiological systems, 303 cell types and more than 1100 organ and organ substructures.

We applied the CYTOMER database to map expression patterns as they are given by the TRANSFAC database [Heinemeyer et al., 1999]. For this purpose, entries of the Hub table have been linked with (human) transcription factor entries in the TRANSFAC FACTOR table via two linking tables: CP for those expression sources where a certain factor has been demonstrated to be expressed in, and CN for those where experimental evidence has been published showing absence of a certain factor. This database system will also provide a basis for a standardized representation of gene expression patterns and profiles in general.

The system has been implemented under MS Access and miniSQL, the latter is also the DBMS which is accessed using the WWW interface at http://transfac.gbf.de/CYTOMER/index.html.


ACKNOWLEDGMENTS

The Sino­Germany scientific­technical collaborative project underlying this work is supported by the German Federal Ministry of Education and Research (CHN­305­97).


REFERENCES