In Silico Biology 3, 0034 (2003); ©2003, Bioinformation Systems e.V.  


Bioinformatic strategies for better understanding of immune function

Nikolai Petrovsky1,2, Christian Schönbach3 and Vladimir Brusic1,4




1 Centre for Medical Informatics, Division of Science and Design, University of Canberra, Bruce ACT 2617, Australia
2 Autoimmunity Research Unit, The Canberra Hospital, Woden ACT 2606, Australia
3 Biomedical Knowledge Discovery Team, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Yokohama 230-0045, Japan
4 Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613

Email: nikolai.petrovsky@anu.edu.au,    schoen@gsc.riken.go.jp,   vladimir@i2r.a-star.edu.sg





Edited by E. Wingender; received May 20, 2003; revised and accepted June 04, 2003; published June 12, 2003



Abstract

Novartis Foundation sponsored a Symposium which brought together a group of experimental immunologists, theoretical immunologists, and bioinformaticians to discuss the new field of immunoinformatics. The discussion focused on immunological databases, antigen processing and presentation, immunogenomics, host-pathogen interactions, and mathematical modelling of the immune system. A main conclusion of the meeting is the critical role played by immunoinformatics in current immunology research. In particular, immunoinformatics provides a foundation for the emerging fields of systems immunology and immunogenomics.

Key words: antigen processing, antigen presentation, host-pathogen interactions, immunogenomics, immunoinformatics, immunological databases, mathematical models




Introduction

The inaugural Immunoinformatics Symposium sponsored by the Novartis Foundation was held in London from 8-10 October, 2002. This meeting brought together for the first time a multidisciplinary group of experts in experimental and theoretical immunology, and bioinformatics to discuss the rapidly developing field of immunoinformatics. A key objective of the meeting was to define immunoinformatics and set guidelines for further development of the field. Presentations focused on information tools, databases, modeling techniques, and strategies to combine experimental and theoretical approaches in immunology. Topics included application of informatics and mathematical methods (search and prediction techniques, natural language processing, mathematical modeling, and databases) to the study and understanding of immune function. These were complemented by talks covering immune system interactions, antigen presentation, T-cell recognition, responses to infection, molecular interactions, gene regulation, genomics, and proteomics. The number of active participants in this research area is larger than the number of participants that could be accommodated at the Symposium. Given the meeting size constraint, main topics were selected to represent areas of the most vigorous research activity in immunoinformatics.

A major focus of the meeting was the issue of the quality of immunological data. Three sources of data were identified; data derived from purely experimental approaches, data derived from models built around experimental data and data derived from purely theoretical approaches. It was concluded that data from all three approaches required equally rigorous validation and demonstration of reproducibility to be accepted as evidence at the highest level of confidence (hard evidence). Results unable to be stringently validated need to be accorded a lower level of confidence (soft evidence) and should be interpreted with caution. To efficiently and successfully perform bioinformatic analysis of immunological systems requires understanding of the complexity and hierarchical nature of the processes that generate biological data, an appreciation of the intrinsic fuzziness of biological data and an understanding of the biases and potential misconceptions which may be contained within data, and an ability to allow for the effects of noise and errors in data.

The meeting was skillfully chaired by Hans-Georg Rammensee (Eberhard-Karls University, Tübingen, Germany), one of the fathers of major histocompatibility complex (MHC)-peptide binding motifs. The presentations started with Vladimir Brusic (Institute for Infocomm Research, Singapore) who gave a historical overview and described the role of immunoinformatics as the management and analysis of immunological data. He described many of the pitfalls and problems with immunological data and how experimental data quality issues were the major impediment to developing accurate immune models [Brusic et al., 1998; Petrovsky and Brusic, 2002]. Other talks belonged to one or more of the main areas: immunological databases, antigen processing and presentation, immunogenomics, host-pathogen interactions, and mathematical models of the immune system.



Immunological databases

The importance of specialist immunological databases was underscored by Vladimir Brusic [Brusic et al., 2000]. Marie-Paule Lefranc (University Montpellier II, CNRS, Montpellier, France) discussed problems and difficulties that needed overcoming when creating the IMGT information system® [http://imgt.cines.fr; Lefranc, 2003] that contains data on immunoglobulins, T cell receptors and MHC of human and other vertebrates. The IMGT Nomenclature of the immunoglobulin and T cell receptor genes and alleles were approved by the Human Genome Organization (HUGO) Nomenclature Committee (HGNC) in 1999. The four databases covered by IMGT include annotated DNA and protein sequences, genetic and structural data of the immune system molecules. The IMGT system, based on the IMGT-ONTOLOGY concepts [Giudicelli and Lefranc, 1999] and on the rules of the IMGT Scientific chart, supports research in autoimmunity, HIV, leukemia, lymphoma, veterinary immunology, genome diversity, evolution of immune system, antibody engineering, and therapeutical approaches. Steven Marsh (Anthony Nolan Research Institute, London, UK) presented the IMGT/HLA database [http://www.ebi.ac.uk/imgt; Robinson et al., 2003] and explained the need for a systematic nomenclature of the immune system products. The nomenclature for human leukocyte antigen (HLA) system was sanctioned by the WHO Nomenclature Committee for the Factors of the HLA System and described the evolution of the nomenclature systems. Current efforts include standardisation of nomenclature for MHC molecules of other species. Darren Flower (Edward Jenner Institute for Vaccine Research, Compton, Berkshire, UK) described the JenPep database of MHC-related peptide data [http://www.jenner.ac.uk/jenpep; Blythe et al., 2002]. Edgar Wingender (GBF, Braunschweig, Germany) presented on the importance of specialized, human expert annotated databases such as databases on gene regulation and signal transduction - TRANSFAC®, TRANSPATH®, and PRODORIC [http://www.gene-regulation.com; Matys et al., 2003; Krull et al., 2003; Münch et al., 2003]. Paul Kellam (University College London, London, UK) described VIDA database [http://www.biochem.ucl.ac.uk/bsm/virus_database; Alba et al., 2001] which contains sequences of homologous proteins derived from open reading frames from viral genomes.



Antigen processing and presentation

Nikolai Petrovsky (Medical Informatics Centre, University of Canberra, Canberra, Australia) gave examples of how modular modeling approaches could advance our understanding of complex immune phenomena. In particular, he demonstrated how a TAP-peptide binding prediction system based on artificial neural networks could be combined with models of HLA binding to create a unique system for probing the behaviour of antigen presentation pathways [Brusic et al., 1999; Petrovsky and Brusic, 2003]. He proposed building a virtual immune system by progressively adding together modules representing each known facet of immune function. He also discussed the potential for immunoinformatics to be applied to clinical immunology, presenting an example of how artificial neural networks could be trained on historical patient data to prospectively predict the outcome of renal transplantation [Petrovsky et al., 2002]. Kamalakar Gulukota (Meta Genomix India Ltd., Secunderabad, India) discussed the potential to use immunoinformatics to personalize medicine, giving the example of personalized peptide vaccines based on a person's particular MHC alleles. Anne De Groot (Brown University, Providence, US) also spoke on the theme of vaccine research, focusing on informatics approaches to complete whole genome T cell epitope mapping for infectious organisms such as hepatitis C virus or HIV. Combining sophisticated computational tools (http://www.epivax.com/epimatrix.html) with experimental epitope mapping offers significant advantage in identification of promiscuous T cell epitopes relative to brute-force experimental screening. The recent 'genome to vaccine' approaches involve deriving novel vaccine candidates directly from whole genomes [De Groot et al., 2001; De Groot et al., 2002]. Darren Flower reviewed quantitative approaches to computational vaccinology, three-dimensional quantitative structure-activity relationship (3D-QSAR), and the comparative molecular similarity indices analysis (CoMSIA) [Doytchinova and Flower, 2002a; Doytchinova and Flower, 2002b]. Hanah Margalit (The Hebrew University, Jerusalem, Israel) discussed the effect of proteosome on shaping T cell epitopes and our understanding of evolutionary forces that underpin the ability of an immune system to respond to these peptides [Altuvia and Margalit, 2000]. She pointed out that the analysis of MHC peptide and structure provides valuable insight for the study of broader biological questions, such as structure-function relationship and genome annotation. Can Kesmir (Technical University of Denmark, Lyngby, Denmark) discussed computational predictions of proteasome cleavage sites [http://www.cbs.dtu.dk/services/NetChop; Kesmir et al., 2002]. Stefan Stevanovic (Eberhard-Karls University, Tübingen, Germany) presented critical facts about HLA-presented peptides. The number of different peptides presented by one given HLA molecule on a single cell lies between one and ten thousand different sequences. Different cell types present different peptide repertoires, offering a great hope for the development of specific immunotherapies. However, a number of known naturally processed peptides for specific HLA alleles is much smaller, and is currently less than 100 even for the best-studied HLA molecule [Stevanovic, 2002]. The peptide motifs that describe peptide-binding rules have been established over years and the catalogue of known HLA ligands is steadily growing [http://www.syfpeithi.de; Rammensee et al., 1999]. He discussed the state-of-the-art of HLA ligand characterisation and registration.



Immunogenomics and host-pathogen interactions

Stephan Beck (Wellcome Trust Sanger Institute, Hinxton, UK) presented an overview of the Human Genome Project in particular highlighting comparative genomics, functional genomics, and epigenomics [Novik et al., 2002] discussed in the context of the MHC region (http://www.sanger.ac.uk/Teams/Team50). The MHC is a major polymorphic region of exceptionally high importance for many common human immune diseases [Allcock et al., 2002] and the complete sequence of a 'virtual' HLA haplotype has been available since 1999. Analyses of expression and regulation of immune genes by micro array and methylation assays of variable positions can link epigenetic markers to gene activity and disease pathways. Christian Schönbach (RIKEN Genomics Sciences Center, Yokohama, Japan) presented data from the FANTOM mouse transcriptome project [Okazaki et al., 2002] and data on immune-related transcripts [facts.gsc.riken.go.jp; Nagashima et al., 2003]). Identification of immune-related transcripts requires study of diverse tissues, such as T cells, B cells, macrophages, and other cells, and various states of their development and activation. An efficient approach of identification of immune- and disease-related genes involves combination of experimental transcription data with sequence and text mining techniques applied to molecular databases and scientific literature. Edgar Wingender (GBF, Braunschweig, Germany) presented bioinformatics approaches to the study of regulatory networks in the cells as well as in the whole organism. He presented results from bioinformatics studies of promoter structures and features that explain microarray data and predict genes and gene products involved in the host-pathogen interactions. The analysis of these genes and products involves comparison to the established knowledge bases of gene regulation and signal transduction. An example of the study of innate immune response signaling pathways of epithelial lung cells to Pseudomonas aeruginosa were given where bioinformatics analysis had generated testable hypotheses. Paul Kellam raised the importance of immunoinformatics and systems theory to understanding host-pathogen interactions at the organism level [Kellam, 2001; Kel-Margoulis, 2002]. By combining gene expression arrays and computational approaches his group produced global dynamic views of genes that block or mimic host immune responses or interfere with apoptosis regulation and cell-cycle control. However, reducing the analysis to a single gene fails to provide the organism-level understanding of these mechanisms. System-level virology, based on the analysis of microarray data provide cellular context in which viruses replicate and highlights antiviral mechanisms, the effects of differential gene expression, and cellular responses.



Mathematical models of the immune system

Dominik Wodarz (Fred Hutchinson Cancer Research Centre, Seattle, USA) spoke on mathematical modeling of cytotoxic T cell responses against viral infections [Wodarz and Nowak, 2002]. The presentation focused on the influence of viral parameters to the outcome of infection, particularly on virus clearance, CTL memory, virus persistence, CTL induced pathology, and tolerance. The theoretical data were matched with experimental data from mouse lymphocytic choriomeningitis virus and pulmonary infections.



Meeting outcomes

Distinguished members of the symposium also included Massimo Bernaschi (IAC Mauro Picone C. N. R., Rome, Italy), Francisco Borras-Cuesta (University of Navarra, Pamplona, Spain), Charles DeLisi (Boston University, Boston, USA), Tim Littlejohn (Biolateral, Sydney, Australia), Terry Lybrand (Vanderbilt University, Nashville, USA), Alan Perelson (LANL, Los Alamos, USA), and Diego Silva (Canberra Hospital, Canberra, Australia). Gregory Bock and Allyson Brown, the main organisers from the Novartis Foundation also contributed greatly to discussions and helped shape the conclusions from the meeting.

The participants agreed that immunological data continue to grow at a rapid rate and this growth is reflected in increases in both the size and complexity of individual databases as well as in the proliferation of new databases and that immunoinformatics as a field at the interface between the immunological and computer sciences provides an effective means to store, analyse, and model large volumes of complex data thereby providing insights into the complexity of living organisms. All participants agreed that further meetings such as the one just concluded were critical to progressing the critical role played by immunoinformatics in current immunology research and in particular its role in providing a foundation for the emerging fields of systems immunology and immunogenomics. There was a general desire to form an immunoinformatics network to allow for future interaction and collaboration. In particular, it was agreed to:

  1. create a consortium to lodge an expression of interest in the Immune Epitope Database project recently proposed by the United States National Institute of Health (NIH) for the creation and maintenance of a combined T and B cell epitope database and associated tools,
  2. set up an Immunoinformatics page with links to databases, tools and resources on the IMGT site (http://imgt.cines.fr),
  3. work towards defining common sets of standards. Examples of areas for standards development include peptide binding data standardisation, exchange formats and development of front-end common interfaces to immunological databases
  4. work on developing a modular approach towards design and implementation of a working simulation of the human immune system.

Overall, the participants concluded that this meeting came at an extremely timely moment in the history of immunology coinciding as it did with the completion of many genome projects and the growing necessity for bioinformatics capability in mainstream immunology research. Nikolai Petrovsky and Vladimir Brusic as meeting convenors were charged with the responsibility for organizing follow-up meetings and with moving towards the establishment of an Immunoinformatics Network.



References

  1. Alba, M. M., Lee, D., Pearl, F. M., Shepherd, A. J., Martin, N., Orengo, C. A. and Kellam P. (2001). VIDA: a virus database system for the organization of animal virus genome open reading frames. Nucleic Acids Res. 29, 133-136.

  2. Allcock, R. J., Atrazhev, A. M., Beck, S., de Jong, P. J., Elliott, J. F., Forbes, S., Halls, K., Horton, R., Osoegawa, K., Rogers, J., Sawcer, S., Todd, J. A., Trowsdale, J., Wang, Y. and Williams, S. (2002). The MHC haplotype project: a resource for HLA-linked association studies. Tissue Antigens 59, 520-521.

  3. Altuvia, Y. and Margalit, H. (2000). Sequence signals for generation of antigenic peptides by the proteasome: implications for proteasomal cleavage mechanism. J. Mol. Biol. 295, 879-890.

  4. Blythe, M. J., Doytchinova, I. A. and Flower, D. (2002). JenPep: a database of quantitative functional peptide data for immunology. Bioinformatics 18, 434-439.

  5. Brusic, V., Wilkins, J. S., Stanyon, C. A. and Zeleznikow, J. (1998a). Data learning: understanding biological data. In: Merrill G. and Pathak D.K. (eds.) Knowledge Sharing Across Biological and Medical Knowledge Based Systems: Papers from the 1998 AAAI Workshop pp. 12-19. AAAI Technical Report WS-98-04. AAAI Press.

  6. Brusic, V., van Endert, P., Zeleznikow, J., Daniel, S., Hammer, J. and Petrovsky, N. (1999). A neural network model approach to the study of human TAP transporter. In Silico Biol. 1, 109-121.

  7. Brusic, V., Zeleznikow, J. and Petrovsky N. (2000). Molecular immunology databases and data repositories. J. Immunol. Methods 238, 17-28.

  8. De Groot, A. S., Bosma, A., Chinai, N., Frost, J., Jesdale, B. M., Gonzalez, M. A., Martin, W. and Saint-Aubin, C. (2001). From genome to vaccine: in silico predictions, ex vivo verification. Vaccine 19, 4385-4895.

  9. De Groot, A. S., Sbai, H., Aubin, C. S., McMurry, J. and Martin, W. (2002). Immuno-informatics: Mining genomes for vaccine components. Immunol. Cell Biol. 80, 255-269.

  10. Doytchinova, I. A. and Flower, D. R. (2002a). A comparative molecular similarity index analysis (CoMSIA) study identifies an HLA-A2 binding supermotif. J. Comput. Aided Mol. Des. 16, 535-544.

  11. Doytchinova, I. A. and Flower, D. R. (2002b). Quantitative approaches to computational vaccinology. Immunol. Cell Biol. 80, 270-279.

  12. Giudicelli, V. and Lefranc, M.-P. (1999). Ontology for Immunogenetics: IMGT-ONTOLOGY. Bioinformatics 15, 1047-1054.

  13. Kawai, J. et al. (2001). Functional annotation of a full-length mouse cDNA collection. Nature 409, 685-690.

  14. Kel-Margoulis, O. V., Ivanova, T. G., Wingender, E. and Kel, A. E. (2002). Automatic annotation of genomic regulatory sequences by searching for composite clusters. Pac. Symp. Biocomput. 7, 187-198.

  15. Kellam, P. (2001). Post-genomic virology: the impact of bioinformatics, microarrays and proteomics on investigating host and pathogen interactions. Rev. Med. Virol. 11, 313-329.

  16. Kesmir, C., Nussbaum, A. K., Schild, H., Detours, V. and Brunak, S. (2002). Prediction of proteasome cleavage motifs by neural networks. Protein Eng. 15, 287-296.

  17. Krull, M., Voss, N., Choi, C., Pistor, S., Potapov, A. and Wingender, E. (2003). TRANSPATH: an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res. 31, 97-100.

  18. Lefranc, M.-P. (2003). IMGT, the international ImMunoGeneTics database. Nucleic Acids Res. 31, 307-310.

  19. Matys, V., Fricke, E., Geffers, R., Gössling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A. E., Kel-Margoulis, O. V., Kloos, D. U., Land, S., Lewicki-Potapov, B., Michael, H., Münch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S. and Wingender, E. (2003). TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374-378.

  20. Münch, R., Hiller, K., Barg, H., Heldt, D., Linz, S., Wingender, E. and Jahn, D. (2003). PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res. 31, 266-269.

  21. Nagashima, T., Silva, S., Socha, L., Petrovsky, N., Suzuki, H., Saito, R., Kasukawa, T., Kurochkin, I., Konagaya, A. and Schönbach C. (2003). Inferring higher functional information for RIKEN mouse full-length cDNA clones with FACTS. Genome Res. 13, in press.

  22. Novik, K. L., Nimmrich, B., Genc, S., Maier, C., Piepenbrock, A., Olek, A. and Beck, S. (2002). Epigenomics: genome-wide study of methylation phenomena. Curr. Issues Mol. Biol. 4, 111-128.

  23. Okazaki, Y. et al. (2002). Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563-573.

  24. Petrovsky, N. and Brusic, V. (2002). Computational immunology: The coming of age. Immunol Cell Biol. 80, 248-254.

  25. Petrovsky, N., Tam, S., Brusic, V., Russ, G., Socha, L. and Bajic, V. (2002). Use of artificial neural networks in improving renal transplantation outcomes. Graft 4, 6-13.

  26. Petrovsky, N. and Brusic, V. (2003). Virtual models of the HLA class I antigen processing pathway. Methods (in press).

  27. Rammensee, H., Bachmann, J., Emmerich, N. P., Bachor, O. A. and Stevanovic, S. (1999). SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 50, 213-219.

  28. Robinson, J., Waller, M. J., Parham, P., de Groot, N., Bontrop, R., Kennedy, L. J., Stoehr, P. and Marsh, S. G. (2003). IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex. Nucleic Acids Res. 31, 311-314.

  29. Stevanovic, S. (2002). Structural basis of immunogenicity. Transpl. Immunol. 10, 133-136.

  30. Wodarz, D. and Nowak, M. A. (2002). Mathematical models of HIV pathogenesis and treatment. Bioessays 24, 1178-1187.