TF-Workbench: a database-driven environment for the analysis of transcription factors.

Martin Werber and Bernd Weisshaar




Max-Planck-Institut für Züchtungsforschung,
Carl v. Linné Weg 10,
D-50829 Köln Germany






TF-Workbench is both a repository for all kinds of data related to transcription factors, and an environment to analyse these data. It was designed for a project specifically dealing with a large gene family encoding MYB-type transcription factors. In Arabidopsis thaliana, this gene family has about 132 members1. Although TF-Workbench was designed to address questions related to MYB transcription factors, it could also be used for analysing other gene families.

The data types stored cover sequence data and types of features of the gene sequence as such (deduced protein sequence, gene designations, gene name synonyms, genomic data, intron-exon structure, etc.). Also, expression data, literature links, information about functional data and other text informations are stored in a relational database. As a database management system MySQL is used. A web interface, which has been written in PHP, provides access to the data. An underlying software written in PERL manages a job queue, starts jobs and stores the job results back into the database. A standard wrapper architecture, also written in PERL, eases the integration of command line bioinformatics tools. The total computation time can be decreased by running scheduler-scripts that take the batch-jobs from the job queue on multiple machines.

One goal of the system was to provide a central in-house data storage and communication platform for a group of scientists working on a large gene family. The other goal was to integrate those command line-based tools that are relevant to analyses of transcription factors family under a web interface. The tools integrated include sequence analysis tools from EMBOSS, gene prediction tools, pattern finding tools like MEME/MAST, PHYLIP and others. The data sets analysed can cover all TFs, or only a subset which can be defined by the results of other tools (for example based on pattern searches). The results are presented in a formatted form and (where possible) with graphics and links to external databases.

Curators have write access and can update the information. All other users have only read access, but for each TF there is the option to add comments, questions and suggestion. This information is provided to all users as a kind of "bulletin board". Since the information in the database is changing quickly, all changes and additions to the DB are logged with user and date information. Users can search with keywords through all the data or browse through sorted lists. An alert system reminds the users if new information on their specific interests has become available.


REFERENCE

  1. R. Stracke, M. Werber, and B. Weisshaar (2001)The R2R3-MYB gene family in Arabidopsis thaliana. Current Opinion in Plant Biology, in press