ISB Home



- Article -





Volume 9


Full article

In Silico Biology 9, 0004 (2008); ©2008, Bioinformation Systems e.V.  



SubCellProt: Predicting protein subcellular localization using machine learning approaches

Prabha Garg1*, Virag Sharma1, Pradeep Chaudhari1 and Nilanjan Roy1,2

1 Center for Pharmacoinformatics, National Institute of Pharmaceutical Education and Research S.A.S. Nagar, Sector 67, S.A.S Nagar, Punjab 160 062, India
2 Department of Biotechnology, National Institute of Pharmaceutical Education and Research S.A.S. Nagar, India

* Corresponding author
   Email: prabhagarg@niper.ac.in


Edited by H. Michael; received August 23, 2008; revised December 01, 2008; accepted December 03, 2008; published December 23, 2008


Abstract

High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner.

Availability: SubCellProt is available at www.databases.niper.ac.in/SubCellProt.


Keywords: protein function, subcellular localization, machine learning, PNN, kNN