ISB Home

- Article -

Volume 2

Full article

In Silico Biology 2, 0048 (2002); ©2002, Bioinformation Systems e.V.  

Prediction of food protein allergenicity: a bioinformatic learning systems approach

Anna Zorzet, Mats Gustafsson1 and Ulf Hammerling2, *

Division of Toxicology, National Food Administration, P.O. Box 622, SE-751 26 Uppsala, Sweden
1 Signal and Systems Group, Uppsala University, P.O. Box 528, SE-751 20 Uppsala, Sweden
2 Phone: +46 18 17 57 52; Fax: +46 18 17 14 33

* corresponding author

Edited by E. Wingender; received September 5, 2002; revised and accepted December 19, 2002; published January 07, 2003


Food hypersensitivity is constantly increasing in Western societies with a prevalence of about 1-2% in Europe and in the USA. Among children, the incidence is even higher. Because of the introduction of foods derived from genetically modified crops on the marketplace, the scientific community, regulatory bodies and international associations have intensified discussions on risk assessment procedures to identify potential food allergenicity of the newly introduced proteins.

In this work, we present a novel biocomputational methodology for the classification of amino acid sequences with regard to food allergenicity and non-allergenicity. This method relies on a computerised learning system trained using selected excerpts of amino acid sequences. One example of such a successful learning system is presented which consists of feature extraction from sequence alignments performed with the FASTA3 algorithm (employing the BLOSUM50 substitution matrix) combined with the k-Nearest-Neighbour (kNN) classification algorithm. Briefly, the two features extracted are the alignment score and the alignment length and the kNN algorithm assigns the pair of extracted features from an unknown sequence to the prevalent class among its k nearest neighbours in the training (prototype) set available.

91 food allergens from several specialised public repositories of food allergy and the SWALL database were identified, pre-processed, and stored, yielding one of the most extensively characterised repositories of allergenic sequences known today. All allergenic sequences were classified using a standard one-leave-out cross validation procedure yielding about 81% correctly classified allergens and the classification of 367 non-allergens in an independent test set resulted in about 98% correct classifications.

The biocomputational approach presented should be regarded as a significant extension and refinement of earlier attempts suggested for in silico food safety assessment. Our results show that the framework described here is powerful enough to become useful as part of a multiple-procedure test scheme that also depicts other evaluation approaches such as solid phase immunoassay and tests for stability to digestions.

Key words: food allergy, risk assessment, computational toxicology