In Silico Biology 4, 0002 (2003); ©2003, Bioinformation Systems e.V.  
Ontology Workshop Tokyo 2003


The Gene Ontology Annotation (GOA) Database - An integrated resource of GO annotations to the UniProt Knowledgebase

Evelyn Camon*, Daniel Barrell, Vivian Lee, Emily Dimmer and Rolf Apweiler




European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SD, UK

* corresponding author
  Email: goa@ebi.ac.uk
  Tel: +44 1223 494465; Fax: +44 1223 494468





Edited by E. Wingender; received September 17, 2003; accepted September 30, 2003; published December 01, 2003




This short communication is provided to accompany a lecture given at the '‘Workshop on Ontologies and Data Integration for Biology’', May 2003, Japan.

Gene Ontology (GO) [1] is a well-established structured vocabulary, which has been successfully used for 6 years in protein annotation. The vocabulary was designed by biologists to replace the multiple nomenclatures used by specialised and large knowledge bases that can hinder data integration. Currently GO consists of over 16,000 terms, distributed over three ontologies and these describe what a protein does (molecular function), how it does it (biological process) and where it performs this task in a generic cell (cellular component). The GO Consortium is successful because it limited its scope to these three ontologies from the outset and has involved the biological community throughout its evolution. In making GO immediately available for biological annotation some errors were inevitable, however the Consortium is committed to its maintenance and upkeep as new genomic databases get involved. As additional ontologies to GO are needed to model biology and experimentation, the GO Consortium has created the Open Biological Ontologies (OBO) web site to encourage communication and the creation of other standard vocabularies that could be freely used in tandem with GO.

At the European Bioinformatic Institute there are two main GO activities, it houses the GO editorial office (edit GO ontologies) headed by Michael Ashburner and Midori Harris and the GO annotation team (annotate GO to gene products) headed by Rolf Apweiler and Evelyn Camon. The rest of this lecture will describe the Gene Ontology Annotation (GOA) database [2].

The GOA database (http://www.ebi.ac.uk/GOA) uses the GO vocabulary to provide high quality electronic and manual annotations to gene products contained in UniProt Knowledgebase (Swiss-Prot, TrEMBL, PIR-PSD) [3]. As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in Swiss-Prot with other databases. This is achieved by converting Swiss-Prot annotation into a recognised computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTR) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups (FlyBase, SGD, MGD), GOA consolidates specialised knowledge and expertise to ensure the data remains a key reference for current biological knowledge. Furthermore, the GOA database fully supports the Human Proteomics Initiative (HPI) [4] by fast-tracking the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTR), a series of GO mapping files (Swiss-Prot keyword to GO, InterPro to GO), and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface via our QuickGO browser or downloaded in a parsable format via the EBI (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/) and GO FTP sites (ftp://ftp.geneontology.org/pub/go/gene-associations/). The GOA dataset can be used to enhance the annotation of particular model organism or gene expression datasets (via slimmed down version of GO, GO-slim, ftp://ftp.geneontology.org/pub/go/GO_slims/), although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt, improve recall and searching with GO and try to facilitate access to all available biological information.Researchers wishing to feedback to the GOA project are encouraged to e-mail: goa@ebi.ac.uk.

The GOA project is grateful for the support of Grants QRLT-2001-00015 and QLRI-2000-00981 of the European Commission and a supplementary NIH grant, 1R01HGO2273-01.


Key words: Gene Ontology, annotation



References

  1. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M. and Sherlock, G. (2000). Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25-29.

  2. Camon, E., Magrane, M., Barrell, D., Binns, D., Fleischmann, W., Kersey, P., Mulder, N., Oinn, T., Maslen, J., Cox, A. and Apweiler, R. (2003). The Gene Ontology Annotation (GOA) project: implementation of GO in Swiss-Prot, TrEMBL and InterPro. Genome Res. 13, 662-672.

  3. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M. C., Estreicher, A., Gasteiger, E., Martin, M. J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S. and Schneider, M. (2003). The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365-370.

  4. O'Donovan, C., Apweiler, R. and Bairoch, A. (2001). The human proteomics initiative (HPI). Trends Biotechnol. 19, 178-181.