A generic system for the management of sequence annotations

Ole Bents and Hans­Werner Mewes




MIPS, Max­Planck­Institut f. Biochemie
D­8 21 52 Martinsried
Phone: +49­89­8578­2453
Fax: +49­89­8578­2655
E-mail: {bents,mewes}@mips.biochem.mpg.de






INTRODUCTION

Current systems that are used for the management of manually edited sequence annotations in genome sequencing projects do still not satisfy the needs of the biologists nor do they satisfy the demands for storing and retrieving data safely and reliably.

We are developing a generic annotation management system (GAMS) that provides a suitable model of genomic entities and a generic application programming interface (API) as well as remote access capabilities via CORBA and document exchange via XML. This system is not specialized or restricted to a specific organism or a specific annotation project. It may also be used to back up automatic annotation systems like Pedant [Frishman and Mewes, 1997] or Magpie [Gaasterland and Sensen, 1996]


An object­oriented model for genomic entities

Object­oriented analysis (OOA) provides a good foundation to design a suitable and adaptable model for the genomic entities that are found in a cell. We have designed classes for the main molecule types DNA, RNA, proteins and (other) metabolites (Fig. 1).

Figure 1: Extract of the UML representation of the molecule model.

As an example, we model Dna objects as containers for DNA sequences with annotated elements like ORFs or tRNAs. The Orf objects correspond to MessengerRna objects, and given a gene model these MessengerRnas correspond to certain Polypeptid objects. A Protein object contains several Polypeptids, having certain properties and cross references to other literature or protein database sources.

Another application of this data model is the dy namic modelling of metabolic pathways [Kastenmüller and Mewes, 1999].


A flexible API for client programs

The API provides all the basic operations to work on genomic objects (Fig. 2). Doing so it hides all the technical aspects of database programming, hence a client programmer can focus on designing clients rather than learning the organization of the database.

Figure 2: Architecture of the API.

A CORBA­layer provides remote access to the methods of the objects that are stored in the database. Alternatively objects can also be dumped into an XML document and vice versa to enable data exchange between different applications.


Abstract interfaces to the platform

To become independent of certain software systems like database management systems we direct the API calls to a driver software. Drivers are responsible for managing specific operations on the corresponding system.

The concept of abstract drivers allows us to change the implementation of existing drivers or to add new drivers to the API without changing the API. All operations are declared by the abstract driver while specific drivers implement these methods for certain systems.


REFERENCES