Volume 7

Reaction Kinetics

In Silico Biology 7 S1, 06 (2007); ©2007, Bioinformation Systems e.V.  

"Good annotation practice" for chemical data in biology

Kirill Degtyarenko*, Marcus Ennis and John Garavelli

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom

* Corresponding author

Edited by I. Rojas and U. Wittig (guest editors); received and accepted March 21, 2007; published March 00, 2007


A structural diagram, in the form of a two-dimensional (2-D) sketch, remains the most effective portrait of a "small molecule" or chemical reaction. However, such structural diagrams, as for any other core data, cannot be used in speech (and should not be used in free text). "Good annotation practice" for biological databases is to use either consistent and widely recognised terminology or unique identifiers from a dedicated database to refer to the molecule of interest. Ideally, scientists should use terminology that is both pronounceable and meaningful. Thus, a viable solution for a bioinformatician is to use a definitive controlled vocabulary of biochemical compounds and reactions, which contains both systematic and common names. In addition, chemical ontologies provide a means for placing entities of interest into wider chemical, biological or medical contexts. We present some challenges and achievements in the standardisation of chemical language in biological databases, with emphasis on three aspects of annotation:

  1. good drawing practice: how to draw unambiguous 2-D diagrams;
  2. good naming practice: how to give most appropriate names; and
  3. good ontology practice: how to link the entity of interest by defined logical relationships to other entities.

Keywords: aleatoric cross-links, annotation, chemical language, databases, graphical representation, InChI, IUPAC, ontology