ISB Home



- Article -





Volume 9


Full article

In Silico Biology 9, 0009 (2009); ©2009, Bioinformation Systems e.V.  



Remote homology detection using a kernel method that combines sequence and secondary-structure similarity scores

Daniela Wieser1* and Mahesan Niranjan2

1 The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
2 The University of Southampton, School of Electronics and Computer Science, University Road, Southampton, SO17 1BJ, UK

* Corresponding author
   Email: dwieser@ebi.ac.uk


Edited by E. Wingender; received December 08, 2008; revised and accepted February 16, 2009; published March 31, 2009


Abstract

Distant evolutionary relationships between proteins with low sequence similarity are difficult to recognise by computational methods. Consequently, many sequences obtained from large-scale sequencing projects cannot be assigned to any known proteins or families despite being evolutionarily related. To boost sensitivity, various sequence-based methods have been modified to make use of the better conserved secondary structure. Most of these methods are instance-based or generative. Here, we introduce a kernel-based remote homology detection method that allows for a combination of sequence and secondary-structure similarity scores in a discriminative approach.

We studied the ability of the method to predict superfamily membership as defined by the SCOP database. We show that a kernel method that combined sequence similarity scores with predicted secondary-structure similarity scores performed similar to a classifier that used scores calculated from sequences and true secondary structures, but performed better than a sequence-only based classifier and achieved a better mean than recently published results on the same data-set.

It can be concluded that SVM classifiers trained to predict homology between distantly related proteins, become more accurate, if a joint sequence/secondary-structure similarity score approach is used.


Keywords: remote homology detection, support vector machines, secondary structures