PathoDB ­ a database that combines pathological data with molecular information

T. Meinhardt, M. Prüß and E. Wingender 1




GBF­ Gesellschaft für Biotechnologische Forschung mbH
Mascheroder Weg 1
D­38124
Phone: +49­531­6181 460
Fax: +49­531­6181 266
E-mail: tme@gbf.de, mpr@gbf.de
1E-mail: ewi@gbf.de






INTRODUCTION

The existing databases on transcription factors and binding sites (TRANSFAC, TRRD, Compel, TFD; see Heinemeyer et al., 1999, for an overview) are mainly focussing on the molecular and/or genetic aspects of the transcriptional machinery and the interaction of its elements. Besides the deficit of phenotypic information the data compiled in the databases mentioned above generally deal with the so called "normal", healthy condition of the respec­ tive organism. However, aberrations of the transcriptional control due to mutations in the key elements generally cause severe impairments. To achieve a more detailed insight into geno­ type­phenotype correlations and thereby gaining a deeper understanding of regulatory mecha­ nisms, we established the relational database PathoDB in which we collect pathological data and model the appropriate relations.


Content and structure of the database

The database encompasses detailed molecular information on mutated transcription factors and regulatory DNA elements (sites) which lead to specific pathological effects. The two tables MuFactor and MuSite harbour the respective data. In MuFactor, for instance, the molecular weight, the protein sequence and as well proteochemical as functional features are stored. For mutated sites, among other details, we keep the DNA­sequence, the genetic loca­ tion, the methodology by which it has been identified and binding or proven non binding fac­ tors. The more general genotypical information is saved in the Genotype table. Here we describe the underlying molecular defect in a text based manner in combination with diag­ nostic and, if available, therapeutic approaches. Genotype can be considered as the central node within the database being connected on the one side with the molecular tables and on the other side with the Phenotype table, in which for each disease a description is given. All of the tables contain information about the originating species and the cited literature as well. The conjunctions between the individual types of information are realised via multiple "links". In some cases, fusion proteins of two transcription factors have been reported, so that it is necessary to link more than one mutated factor to a "given" genotype. Even more evident is the necessity for proper linking on the Genotype Phenotype axis. It is obvious that the same phenotype can be produced by more than one genotype. But even harder to model is the fact that, in dependence of the exact haplotype, the phenotype varies significantly. We propose a system of linking tables, which is capable to handle most, if not all of the genotypic constel­ lations.

To access data beyond PathoDB's primary field of interest, external databases (OMIM, MGI and HGMD) are connected to Genotype and Phenotype entries. Internally, the database is closely linked to the TRANSFAC system to open up optimal analytic possibilities (links to sites and (interacting) wild type factors, regulating pathways, cross­comparison of mutated versus wild type entries, etc.).


Status and perspectives

The database has been established under a relational DBMS and the functional development of the basic features is almost complete. The first sets of data have been entered and will be expanded with high priority. So far, PathoDB comprises a large set of thalassemiae, caused by mutations in the regulatory regions of the beta and delta globin genes, and dwarfisms evoked by mutations or chromosomal rearrangements in the gene for transcription factor Pit­1. For primary WWW access we will compile an ASCII flat file version that should be available for presentation at the GCB '99. We finally aim at making the relational PathoDB accessible online.


ACKNOWLEDGEMENTS

This work has been supported by a grant of the German Ministry of Education, Science, Research and Technology (BMBF; Project No. 0311640).


REFERENCES