Towards a New Curation Environment for the TrEMBL Database

Kai Runte1, Henning Hermjakob and Rolf Appweiler




EMBL Outstation EBI (European Bioinformatics Institute)
1E-mail: krunte@ebi.ac.uk






On this poster we present an integrated annotation environment for the TrEMBL database. It is based directly on a newly introduced XML file format, which will supplement the flat file format of TrEMBL and will be publicly available with the December release of TrEMBL. In contrast to the flat file format this new format provides an exact formal definition of the grammar via XML Schema. This eases the development of tools aiding the curation by checking entries for the correct syntax and coherence, eg. the matching of sequence and CRC checksum or positions of features in the sequence.

The environment is built around the XML editor "SPedit". To be as platform independent as possible we chose Java as implementation language. The editor replaces the programmable text processor "Brief". The editor displays the XML file in form of a tree, as shown in the screenshot.

All the features needed for annotation process are integrated as so-called plug-ins, small programs that add functionality to the editor. There are two types of plug-ins. One type processes and changes the document. An example would be a program that automatically re-calculated the checksum, sequence length and molecular weight if the sequence itself is changed.

The second kind replaces the standard display (as shown in screenshot) of certain parts of an XML document. For example keywords are saved in the following kind of XML structure:

The display of eg. comma-seperated list as is a much better solution and would be provided by a display plug-in.