The existing databases on transcription factors and binding sites (TRANSFAC, TRRD,
Compel, TFD; see Heinemeyer et al., 1999, for an overview) are mainly focussing on the
molecular and/or genetic aspects of the transcriptional machinery and the interaction of its
elements. Besides the deficit of phenotypic information the data compiled in the databases
mentioned above generally deal with the so called "normal", healthy condition of the respec
tive organism. However, aberrations of the transcriptional control due to mutations in the key
elements generally cause severe impairments. To achieve a more detailed insight into geno
typephenotype correlations and thereby gaining a deeper understanding of regulatory mecha
nisms, we established the relational database PathoDB in which we collect pathological data
and model the appropriate relations.
The database encompasses detailed molecular information on mutated transcription factors and regulatory DNA elements (sites) which lead to specific pathological effects. The two tables MuFactor and MuSite harbour the respective data. In MuFactor, for instance, the molecular weight, the protein sequence and as well proteochemical as functional features are stored. For mutated sites, among other details, we keep the DNAsequence, the genetic loca tion, the methodology by which it has been identified and binding or proven non binding fac tors. The more general genotypical information is saved in the Genotype table. Here we describe the underlying molecular defect in a text based manner in combination with diag nostic and, if available, therapeutic approaches. Genotype can be considered as the central node within the database being connected on the one side with the molecular tables and on the other side with the Phenotype table, in which for each disease a description is given. All of the tables contain information about the originating species and the cited literature as well. The conjunctions between the individual types of information are realised via multiple "links". In some cases, fusion proteins of two transcription factors have been reported, so that it is necessary to link more than one mutated factor to a "given" genotype. Even more evident is the necessity for proper linking on the Genotype Phenotype axis. It is obvious that the same phenotype can be produced by more than one genotype. But even harder to model is the fact that, in dependence of the exact haplotype, the phenotype varies significantly. We propose a system of linking tables, which is capable to handle most, if not all of the genotypic constel lations.
To access data beyond PathoDB's primary field of interest, external databases (OMIM, MGI
and HGMD) are connected to Genotype and Phenotype entries. Internally, the database is
closely linked to the TRANSFAC system to open up optimal analytic possibilities (links to
sites and (interacting) wild type factors, regulating pathways, crosscomparison of mutated
versus wild type entries, etc.).
The database has been established under a relational DBMS and the functional development
of the basic features is almost complete. The first sets of data have been entered and will be
expanded with high priority. So far, PathoDB comprises a large set of thalassemiae, caused by
mutations in the regulatory regions of the beta and delta globin genes, and dwarfisms evoked
by mutations or chromosomal rearrangements in the gene for transcription factor Pit1.
For primary WWW access we will compile an ASCII flat file version that should be available
for presentation at the GCB '99. We finally aim at making the relational PathoDB accessible
online.
This work has been supported by a grant of the German Ministry of Education, Science, Research and Technology (BMBF; Project No. 0311640).