University of Montreal, Department of Biochemistry
2900 Blvd Edouard-Montpetit,
Montreal, QC, Canada
GOBASE is a specialized biological relational database that integrates diverse data on organelles (mitochondria and chloroplasts) such as DNA and protein sequences, RNA secondary structure diagrams, taxonomic information and genetic maps of completely sequenced mitochondrial DNAs. GOBASE has been available for public access via the WWW since 1996. It originally housed only mitochondrial data, while chloroplast data have recently been included.
Today, GOBASE includes over 80000 sequences. The major part of the data in GOBASE, i.e., sequence and taxonomic data, are being retrieved from the public sequence data repository at NCBI, and validated by experts in house. Maintaining a curated database comes with a very high labor cost. This is largely due to the fact that genomic sequences are being generated at an unprecedented rate and that records retrieved from public repositories contain annotation errors, nonstandard, or misleading information that requires correction.
Here, we present our efforts to substantially reduce manual data correction, through increased automation, and maximize code reusability by adopting an object-oriented technology.
Our initial approach has been to use Unified Modeling Language (UML)
to create a list of possible cases of data inconsistencies that we have
found in GOBASE. Every case is regarded separately, an expert solution is
devised and represented as a diagram. At that time the UML diagrams are
used as templates for writing an object-oriented automation programs.