Ensembl: Annotating metazoan genomes

E. Birney




EMBL Outstation-European Bioinformatics Institute,
Wellcome Trust Genome Campus,
Hinxton, Cambridge, UK
Tel: +44 1223 494420
Email: birney@ebi.ac.uk







The Ensembl project (based at www.ensembl.org) aims to provide an entirely open suite of data and software for large eukaryotic genomes. Ensembl provides an actively maintained dataset for the human and mouse genomes. All the data produced by Ensembl is placed in the public domain; the software is licensed under an extremely open Apache style license. There are over 20 sites with an externally running copy of the Ensembl web server and two sites with a full installation of the web site and underlying analysis system.

Building Ensembl has required facing many bioinformatics and software engineering challenges, from algorithmical issues for gene prediction, through software engineering challenges for large scale compute management to user interface design. In my talk I will introduce the Ensembl database and its uses, at the same time touching on some of the challenges we have met whilst building the system.