ISB Home



- Article -





Volume 6


Full article

In Silico Biology 6, 0030 (2006); ©2006, Bioinformation Systems e.V.  



Analysis and comparison of benchmarks for multiple sequence alignment

Gordon Blackshields*, Iain M. Wallace, Mark Larkin and Desmond G. Higgins

Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland

* Corresponding author
   Email: Gordon.blackshields@ucd.ie


Edited by E. Wingender; received April 03, 2006; revised May 30, 2006; accepted June 03, 2006; published June 08, 2006


Abstract

The most popular way of comparing the performance of multiple sequence alignment programs is to use empirical testing on sets of test sequences. Several such test sets now exist, each with potential strengths and weaknesses. We apply several different alignment packages to 6 benchmark datasets, and compare their relative performances. HOMSTRAD, a collection of alignments of homologous proteins, is regularly used as a benchmark for sequence alignment though it is not designed as such, and lacks annotation of reliable regions within the alignment. We introduce this annotation into HOMSTRAD using protein structural superposition. Results on each database show that method performance is dependent on the input sequences. Alignment benchmarks are regularly used in combination to measure performance across a spectrum of alignment problems. Through combining benchmarks, it is possible to detect whether a program has been over-optimised for a single dataset, or alignment problem type.


Keywords: multiple sequence alignment, benchmark, BAliBASE, HOMSTRAD, IRMbase, OxBench, PREFAB, SABmark, structural superposition, RMSD, over-training, over-alignment