| In Silico Biology 5, 0006 (2004); ©2004, Bioinformation Systems e.V. |
| Ontology Workshop Göttingen 2004 |
IMGT, the international ImMunoGeneTics information system®
Université Montpellier II, Laboratoire d'ImmunoGénétique Moléculaire LIGM
UPR CNRS 1142, Institut de Génétique Humaine IGH,
141 rue de la Cardonille
34396 Montpellier Cedex 5, France
Phone: +33 4 99 61 99 65, Fax: +33 4 99 61 99 01
* Corresponding author
Email: lefranc@ligm.igh.cnrs.fr
Institut Universitaire de France
Edited by H. Michael; received September 11, 2004; revised and accepted December 14, 2004; published December 24, 2004
IMGT, the international ImMunoGeneTics information system® (http://imgt.cines.fr), was created in 1989 at Montpellier, France. IMGT is a high quality integrated knowledge resource specialized in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC) of human and other vertebrates, and related proteins of the immune system (RPI) which belong to the immunoglobulin superfamily (IgSF) and MHC superfamily (MhcSF). IMGT provides a common access to standardized data from genome, proteome, genetics and three-dimensional structures. The accuracy and the consistency of IMGT data are based on IMGT-ONTOLOGY, a semantic specification of terms to be used in immunogenetics and immunoinformatics. IMGT-ONTOLOGY has been formalized using XML Schema (IMGT-ML) for interoperability with other information systems. We are developing Web services to automatically query IMGT databases and tools. This is the first step towards IMGT-Choreography which will trigger and coordinate dynamic interactions between IMGT Web services to process complex significant biological and clinical requests. IMGT-Choreography will further increase the IMGT leadership in immunogenetics and immunoinformatics for medical research (repertoire analysis of the IG antibody sites and of the TR recognition sites in autoimmune and infectious diseases, AIDS, leukemias, lymphomas, myelomas), veterinary research (IG and TR repertoires in farm and wild life species), genome diversity and genome evolution studies of the adaptive immune responses, biotechnology related to antibody engineering (single chain Fragment variable (scFv), phage displays, combinatorial libraries, chimeric, humanized and human antibodies), diagnostics (detection and follow up of residual diseases) and therapeutical approaches (grafts, immunotherapy, vaccinology).
IMGT is freely available at http://imgt.cines.fr.
Keywords: IMGT, ontology, database, information system, knowledge resource, immunoinformatics, immunogenetics, antibody, immunoglobulin, T cell receptor, superfamily, MHC, HLA, Collier de Perles, three-dimensional, 3D structure, polymorphism, choreography, Web service, annotation
Genome and proteome analysis interpretation represents the current great challenge, as a huge quantity of data is produced by many scientific fields, including fundamental, clinical, veterinary and pharmaceutical research. In particular, the number of sequences and related data published in the immunogenetics fields is growing exponentially. The number of potential protein forms of the antigen receptors, immunoglobulins (IG) and T cell receptors (TR) is almost unlimited. The potential repertoire of each individual is estimated to comprise about 1012 different IG (or antibodies) and TR, and the limiting factor is only the number of B and T cells that an organism is genetically programmed to produce. This huge diversity is inherent to the particularly complex and unique molecular synthesis and genetics of the antigen receptor chains. This includes biological mechanisms such as DNA molecular rearrangements in multiple loci (three for IG and four for TR in humans) located on different chromosomes (four in humans), nucleotide deletions and insertions at the rearrangement junctions (or N-diversity), and somatic hypermutations in the IG loci (see FactsBooks [1, 2] for review).
IMGT, the international ImMunoGeneTics information system® (http://imgt.cines.fr) [3, 4], was created in 1989, by Marie-Paule Lefranc , at the Laboratoire d'ImmunoGénétique Moléculaire (LIGM) (Université Montpellier II and CNRS) at Montpellier, France, in order to standardize and manage the complexity of the immunogenetics data. Fifteen years later, IMGT is the international reference in immunogenetics and immunoinformatics, and provides a high quality integrated knowledge resource, specialized in the IG, TR, major histocompatibility complex (MHC) of human and other vertebrates, and related proteins of the immune systems (RPI) of any species which belong to the immunoglobulin superfamily (IgSF) and to the MHC superfamily (MhcSF) [4 - 15].
The IMGT information system consists of four sequence databases, one genome database and one three-dimensional (3D) structure database, interactive tools for sequence, genome and 3D structure analysis, Web resources ("IMGT Marie-Paule page") comprising 8,000 HTML pages of synthesis (IMGT Repertoire), knowledge (IMGT Scientific chart, IMGT Education, IMGT Index) and external links (IMGT Bloc-notes and IMGT other accesses) [4]. Despite the heterogeneity of these different components, all data in the IMGT information system are expertly annotated. The accuracy, the consistency and the integration of the IMGT data, as well as the coherence between the different IMGT components (databases, tools and Web resources) are based on IMGT-ONTOLOGY , which provides a semantic specification of the terms to be used in immunogenetics and immunoinformatics [16]. IMGT-ONTOLOGY, the first ontology in the domain, has allowed the management of knowledge in immunogenetics [4] and provided standardization for immunogenetics data from genome, proteome, genetics and 3D structures [4 - 15] (IMGT milestones ).
IMGT-ONTOLOGY concepts are available, for the biologists and IMGT users, in the IMGT Scientific chart [8], and have been formalized, for the computing scientists, in IMGT-ML which uses XML (Extensible Markup Language) Schema [17, 18]. The IMGT Scientific chart (for biologist agents) and IMGT-ML (for computing agents), which are the foundations of the IMGT knowledge and data management system, constitute the "IMGT-ONTOLOGY layer" (Table 1).
In order to allow any IMGT component to be automatically queried and to achieve a higher level of interoperability inside the IMGT information system and with other information systems, we are implementing, on top of the IMGT-ONTOLOGY layer, the "IMGT-Choreography layer". It comprises the modelling of the three major IMGT biological approaches, genomics, genetics and structural approaches, the analysis of the IMGT components in relation with the concepts (depicted for example, for the tools, as IMGT tool diamonds), and the development of Web services (built using the IMGT-SOA infrastructure). It is the first step towards the implementation of IMGT-Choreography [19], which corresponds to the process of complex immunogenetics knowledge [20, 21] and to the connection of treatments performed by the IMGT component Web services.
| Table 1: | The IMGT-ONTOLOGY and IMGT-Choreography layers. |
| Biologist agents | Computer agents | |
| IMGT-Choreography layer | IMGT biological approaches - IMGT genomics approach - IMGT genetics approach - IMGT structural approach IMGT tool diamonds |
Web services and IMGT-SOA |
| IMGT-ONTOLOGY layer | IMGT Scientific chart | IMGT-ML (XML Schema) |
IMGT scientific chart
The IMGT Scientific chart [8] comprises the controlled vocabulary and the annotation rules necessary for the immunogenetics data identification, description, classification and numbering and for knowledge management in the IMGT information system. Standardized keywords, labels and annotation rules, standardized IG and TR gene nomenclature, the IMGT unique numbering, and standardized origin/methodology were defined, respectively, based on the six main concepts of IMGT-ONTOLOGY: IDENTIFICATION, CLASSIFICATION, DESCRIPTION, NUMEROTATION, ORIENTATION and OBTENTION [4, 16] (Table 2). The IMGT Scientific chart is available as a section of the IMGT Web resources (IMGT Marie-Paule page). These HTML pages are devoted to biologists, IMGT users and IMGT annotators. Examples of IMGT expertised data concepts [9, 10] derived from the IMGT Scientific chart rules are shown in Table 2.
| Table 2: | IMGT-ONTOLOGY concepts, IMGT Scientific chart rules and examples of IMGT expertised data concepts. |
| IMGT-ONTOLOGY main concepts [16] |
IMGT Scientific chart rules [8] | Examples of IMGT expertised data concepts [9, 10] |
| IDENTIFICATION | Standardized keywords [6] | Species, molecule type, receptor type, chain type, gene type, structure, functionality, specificity [6, 7] |
| DESCRIPTION | Standardized labels and annotations [6] | Core (V-, D-, J-, C-REGION) Prototypes [6] Labels for sequences Labels for 2D and 3D structures |
| CLASSIFICATION | Reference sequences [8] Standardized IG and TR gene nomenclature (group, subgroup, gene, allele) [8, 16] |
Nomenclature of the human IG and TR genes (entry in 1999 in GDB, HGNC [22] and LocusLink at NCBI) [1, 2] Alignment of alleles [1, 2, 7] Nomenclature of the IG and TR genes of all vertebrate species [8] |
| NUMEROTATION | IMGT unique numbering [7, 23, 24] for: V- and V-LIKE-DOMAINs [25] C- and C-LIKE-DOMAINs [26] G- and G-LIKE-DOMAINs [27] |
Protein displays [8] IMGT Colliers de Perles [7, 8, 28] FR-IMGT and CDR-IMGT delimitations [8] Structural loops and beta strands delimitations [26] |
| ORIENTATION | Orientation of genomic instances relative to each other | Chromosome orientation Locus orientation Gene orientation DNA strand orientation |
| OBTENTION | Standardized origin Standardized methodology [4] |
IMGT-ML
IMGT-ML [17, 18] (http://imgt.cines.fr/textes/IMGTindex/IMGT-ML.html) represents the specification of the main IMGT-ONTOLOGY concepts [16], formalized through an in-house defined mark-up language, based on the Extensible Markup Language (XML) (http://www.w3.org/XML/) and constrained through XML Schema (http://www.w3.org/XML/Schema). XML is useful both internally for the integration of data and externally for sharing data with other information systems.
IMGT biological approaches
Three major IMGT biological approaches, genomics, genetics and structural approaches, have been selected for the modelling of interactions between the IMGT components (databases, tools and Web resources). Databases and tools are shown in Figure 1.
Although the IMGT genome, sequence and 3D structure databases, IMGT analysis tools and IMGT Repertoire Web resources, were initially implemented for the IG, TR and MHC of human and other vertebrates, data and knowledge management standardization has now been extended to the related proteins of the immune system (RPI) which comprise the proteins of the immunoglobulin superfamily (IgSF) [29] and the proteins of the MHC superfamily (MhcSF) [30] of any species (IMGT Repertoire (RPI)). Thus, standardization in IMGT contributed to data enhancement of the system and new expertised data concepts were readily incorporated. The IMGT components in the three IMGT biological approaches are described in the next sections.
|
Figure 1: IMGT-Choreography. Examples of interactions between the IMGT databases and tools following the three main IMGT biological approaches: genomics, genetics and structural approaches. The corresponding IMGT Repertoire Web resources (not shown in the figure) are described in Tables 3, 4 and 5. |
IMGT genomics approach
The IMGT genomics approach refers to the study of the genes within their loci and on the chromosomes (Table 3). Genomic data are managed in IMGT/GENE‑DB , which is the comprehensive IMGT genome database [31]. In September 2004, IMGT/GENE-DB contained 1,375 genes and 2,204 alleles (673 IG and TR genes and 1,028 alleles from Homo sapiens, and 702 IG and TR genes and 996 alleles from Mus musculus, Mus cookii, Mus pahari, Mus spretus, Mus saxicola, Mus minutoïdes). All the human and mouse IG and TR genes are available in IMGT/GENE-DB. Based on the IMGT CLASSIFICATION concept, all the human IMGT gene names [1, 2] were approved by the HUGO Nomenclature Committee HGNC in 1999 [22], and entered in IMGT/GENE-DB [31], Genome DataBase GDB (Canada) [32], LocusLink at NCBI (USA) [33], and GeneCards [34]. Reciprocal links exist between IMGT/GENE-DB, HGNC, GDB, LocusLink and Entrez at NCBI and GeneCards. All the mouse IG and TR gene names with IMGT reference sequences were provided by IMGT to HGNC and to the Mouse Genome Database MGD [35] in July 2002. Queries in IMGT/GENE-DB can be performed according to IG and TR gene classification criteria and IMGT reference sequences have been defined for each allele of each gene based on one or, whenever possible, several of the following criteria: germline sequence, first sequence published, longest sequence, mapped sequence [8]. IMGT/GENE‑DB interacts dynamically with IMGT/LIGM-DB [36, 37, 38] to download and display gene-related sequence data. This is the first example of an interaction between IMGT databases using the CLASSIFICATION concept.
The IMGT genome analysis tools (IMGT/LocusView, IMGT/GeneView, IMGT/GeneSearch and IMGT/CloneSearch) manage the locus organization and gene location and provide the display of physical maps for the human IG, TR and MHC loci and for the mouse TRA/TRD locus. IMGT/LocusView allows to view genes in a locus and to zoom on a given area. IMGT/GeneView allows to view a given gene in a locus. IMGT/GeneSearch allows to search for genes in a locus based on IMGT gene names, functionality or localization on the chromosome. IMGT/CloneSearch provides information on the clones that were used to build the locus contigs displayed in IMGT/LocusView (accession numbers are from IMGT/LIGM-DB, gene names from IMGT/GENE-DB, and clone position and orientation, and overlapping clones from IMGT/LocusView). IMGT/GeneInfo provides and displays information on the human and mouse TR potential rearrangements [39].
| Table 3: | IMGT components of the genomics approach. |
| IMGT genome database [4] | IMGT genome analysis tools [14] | IMGT Repertoire "Locus and genes" section [10, 15, 40 - 60] (1) |
| IMGT/GENE-DB [31] | IMGT/LocusView IMGT/GeneView IMGT/GeneSearch IMGT/CloneSearch IMGT/GeneInfo [39] |
Chromosomal localizations [1, 2] Locus representations [1, 2] Locus description Gene tables , etc. Potential germline repertoires Lists of genes Correspondence between nomenclatures [1, 2] |
| (1)IMGT Web resources (IMGT Marie-Paule page) also include IMGT Index, IMGT Education (Aide-mémoire, Tutorials, Questions and answers, IMGT Lexique, The IMGT Medical page, The IMGT Veterinary page, The IMGT Biotechnology page), IMGT Bloc-notes (The IMGT Immunoinformatics page, Interesting links, etc.) [4, 10] which are not detailed in this paper. |
IMGT genetics approach
The IMGT genetics approach refers to the study of genes in relation with their polymorphisms, mutations, expression, specificity and evolution (Table 4). The IMGT genetics approach heavily relies on the DESCRIPTION concept (and particularly on the V-REGION, D-REGION, J-REGION and C-REGION core concepts for the IG and TR), on the CLASSIFICATION concept (gene and allele concepts) and on the NUMEROTATION concept (IMGT unique numbering [25, 26]).
IMGT/LIGM-DB is the comprehensive IMGT database of IG and TR nucleotide sequences from human and other vertebrate species, with translation for fully annotated sequences, created in 1989 by LIGM, Montpellier, France, on the Web since July 1995 [5, 36, 37, 38]. The IMGT/LIGM-DB annotations (gene and allele name assignment, labels) allow data retrieval not only from IMGT/LIGM-DB, but also from other IMGT databases. As an example, the IMGT/GENE-DB entries provide the IMGT/LIGM-DB accession numbers of the IG and TR cDNA sequences which contain a given V, D, J or C gene. IMGT/PROTEIN-DB is a new IMGT database which provides IG and TR amino acid sequences for productive rearranged cDNA sequences from IMGT/LIGM-DB. Standardized information on oligonucleotides (or Primers) and combinations of primers (Sets, Couples) for IG and TR are managed in IMGT/PRIMER-DB [61], the IMGT oligonucleotide database on the Web since February 2002. IMGT/MHC-DB [62] hosted at EBI comprises IMGT/HLA for human MHC (or HLA) and IMGT/MHC-NHP for MHC of non-human primates.
The IMGT tools for the genetics approach comprise IMGT/V-QUEST [9, 63], for the identification of the V, D and J genes and of their mutations, IMGT/JunctionAnalysis [64] for the analysis of the V-J and V-D-J junctions which confer the antigen receptor specificity, IMGT/Allele-Align for the detection of polymorphisms, and IMGT/Phylogene [65] for gene evolution analyses. IMGT/V-QUEST (V-QUEry and STandardization) is an integrated software for IG and TR [9, 63]. This tool, easy to use, analyses an input IG or TR germline or rearranged variable nucleotide sequence. IMGT/V-QUEST results comprise the identification of the V, D and J genes and alleles and the nucleotide alignments by comparison with sequences from the IMGT reference directory, the delimitations of the FR-IMGT and CDR-IMGT based on the IMGT unique numbering, the protein translation of the input sequence, the identification of the JUNCTION, and the two-dimensional (2D) IMGT Collier de Perles representation of the V-REGION ("IMGT/V-QUEST output" in IMGT/V-QUEST Documentation). IMGT/JunctionAnalysis [64] is a tool, complementary to IMGT/V-QUEST, which provides a thorough analysis of the V-J and V-D-J junction of IG and TR rearranged genes. IMGT/JunctionAnalysis identifies the D-GENEs and alleles involved in the IGH, TRB and TRD V-D-J rearrangements by comparison with the IMGT reference directory, and delimits precisely the P, N and D regions ("IMGT/JunctionAnalysis output results" in IMGT/JunctionAnalysis Documentation ). Several hundreds of junction sequences can be analysed simultaneously. IMGT/Allele-Align allows the comparison of two alleles highlighting the nucleotide and amino acid differences. IMGT/Phylogene [65] is an easy to use tool for phylogenetic analysis of variable region (V-REGION) and constant domain (C-DOMAIN) sequences. This tool is particularly useful in developmental and comparative immunology. The users can analyse their own sequences by comparison with the IMGT standardized reference sequences for human and mouse IG and TR [65] (IMGT/PhyloGene Documentation).
| Table 4: | IMGT components of the genetics approach. |
| IMGT sequence databases [4] | IMGT sequence analysis tools [14] | IMGT Repertoire "Proteins and alleles" section [10, 15, 40 - 60] |
| IMGT/LIGM-DB [5, 36]
IMGT/PROTEIN-DB IMGT/PRIMER-DB [61] IMGT/MHC-DB [62] |
IMGT/V-QUEST [63]
IMGT/JunctionAnalysis [64] IMGT/Allele-Align IMGT/PhyloGene [65] IMGT/Automat [66] IMGT/GeneFrequency |
Alignments of alleles IG and TR [1, 2, 7]
Alignments of alleles RPI Protein displays IG and TR [1, 2, 8, 46, 53] Protein displays MHC Protein displays RPI Tables of alleles IG and TR Tables of alleles RPI Allotypes [67, 68] Isotypes [69], etc. |
IMGT/Automat [66] is an internal Java tool which implements IMGT/V-QUEST and IMGT/JunctionAnalysis to automatically perform the annotation of rearranged cDNA sequences in IMGT/LIGM-DB. IMGT/GeneFrequency is a new IMGT interactive tool which dynamically computes cartographic bar-chart representations of the different IG or TR loci. Histograms represent the contribution of individual V, D and J genes in sets of expressed rearranged V-D-J sequences in IMGT/LIGM-DB. IMGT/Genefrequency results are obtained by querying IMGT/LIGM-DB for sequences which are selected, for example, on the specificity criteria. It will be available from the IMGT Home page at http://imgt.cines.fr within the year 2005. In a next step, dynamic interactions will be developed with IMGT/GENE-DB and the genome analysis tools, and with the IMGT/PROTEIN-DB and IMGT/PRIMER-DB [61] sequence databases.
IMGT structural approach
The IMGT structural approach refers to the study of the 2D and 3D structures of the IG, TR, MHC and RPI, and to the antigen or ligand binding characteristics in relation with the protein functions, polymorphisms and evolution (Table 5). The structural approach relies on the CLASSIFICATION concept (IMGT gene and allele names), DESCRIPTION concept (receptor and chain description, domain delimitations), and NUMEROTATION concept (amino acid positions according to the IMGT unique numbering [25, 26]). Structural and functional domains of the IG and TR chains comprise the variable domain or V-DOMAIN (9-strand beta-sandwich) which corresponds to the V-J-REGION or V-D-J-REGION and is encoded by two or three genes [1, 2], the constant domain or C-DOMAIN (7-strand beta-sandwich), and, for the MHC chains, the groove domain or G-DOMAIN (4 beta-strand and one alpha-helix). The IMGT unique numbering, initially defined for the V-DOMAINs of the IG and TR and for the V-LIKE-DOMAINs of IgSF proteins other than IG and TR [25], has been extended to the C-DOMAINs of the IG and TR and to the C-LIKE-DOMAINs of IgSF proteins other than IG and TR [26]. An IMGT unique numbering has also been implemented for the groove domain (G-DOMAIN) of the MHC class I and II chains, and for the G-LIKE-DOMAINs of MhcSF proteins other than MHC [27].
| Table 5: | IMGT components of the structural approach. |
| IMGT structural database [4] | IMGT structural analysis tool [14] | IMGT Repertoire "2D and 3D structures" section [10, 15, 40 - 60] |
| IMGT/3Dstructure-DB [70] | IMGT/StructuralQuery [70] | 2D Colliers de Perles [28] IG and TR (1),
2D Colliers de Perles MHC, 2D Colliers de Perles RPI IMGT classes for amino acid characteristics [72] IMGT Colliers de Perles reference profiles [72] |
| 3D representations (1) |
| (1) Cover of the Nucleic Acids Research 1999 database issue (http://imgt.cines.fr/textes/IMGTinformation/Couv_NAR99.jpg) |
Structural data are compiled and annotated in IMGT/3Dstructure-DB , the IMGT 3D structure database, on the Web since November 2001 [70]. IMGT/3Dstructure-DB comprises IG, TR, MHC and RPI with known 3D structures. Coordinate files extracted from the Protein Data Bank (PDB) [71] are renumbered according to the standardized IMGT unique numbering [25, 26]. The IMGT/3Dstructure-DB cards provide IMGT annotations (assignment of IMGT genes and alleles, IMGT chain and domain labels, IMGT Colliers de Perles on one layer and two layers), downloadable renumbered IMGT/3Dstructure-DB flat files, vizualisation tools and external links. IMGT/3Dstructure-DB residue cards provide detailed information on the inter- and intra-domain contacts of each residue position (IMGT/3Dstructure-DB Documentation ).
The IMGT/StructuralQuery tool [70] analyses the intramolecular interactions for the V-DOMAINs. The contacts are described per domain (intra- and inter-domain contacts) and annotated in term of IMGT labels (chains, domain), positions (IMGT unique numbering), backbone or side-chain implication. IMGT/StructuralQuery allows to retrieve the IMGT/3Dstructure-DB entries, based on specific structural characteristics: phi and psi angles, accessible surface area (ASA), amino acid type, distance in angstrom between amino acids, CDR-IMGT lengths [25].
In order to appropriately analyse the amino acid resemblances and differences between IG, TR, MHC and RPI chains, eleven IMGT classes were defined for the "chemical characteristics’ amino acid properties and used to set up IMGT Colliers de Perles reference profiles [72]. The IMGT Colliers de Perles reference profiles allow to easily compare amino acid properties at each position whatever the domain, the chain, the receptor or the species. The IG and TR variable and constant domains represent a privileged situation for the analysis of amino acid properties in relation with 3D structures, by the conservation of their 3D structure despite divergent amino acid sequences, and by the considerable amount of genomic (IMGT Repertoire), structural (IMGT/3Dstructure-DB) and functional data available. These data are not only useful to study mutations and allele polymorphisms, but are also needed to establish correlations between amino acids in the protein sequences and 3D structures and to determine amino acids potentially involved in the immunogenicity.
IMGT tool diamonds
In order to enhance the interoperability between the IMGT components, IMGT tools were analysed for input and output parameters, performed tasks and accompanying databases (IMGT reference directories). Graphical diamond-shaped representations, designated as "IMGT tool diamonds" (Fig. 2) were developed to obtain tool profiles and to compare the state of the art of each tool in relation with the IMGT ontological concepts. Each IMGT tool diamond is composed of 16 modules (Fig. 2A) and each module comprises 4 facets: input parameters, task, IMGT reference directory and output parameters (Fig. 2B). For a given module (that is a given concept), each facet acts as a Boolean switch and indicates whether input parameters are necessary or not, whether a task is performed or not, whether an expertised IMGT reference directorty is needed or not, and whether output parameters are provided or not, respectively.
The four modules at the core of the IMGT tool diamond (red) correspond to the major concepts of the tool and are usually supported by specific tasks [39, 63, 64, 65, 70]. The 12 outer modules correspond to concepts usually shared with other tools: those of the west pole (blue) correspond to the gene configuration (germline, rearranged or not defined), those of the north pole (orange) to the functionality of the germline sequences (Functional (F), Open Reading Frame (ORF), Pseudogenes (P)), those of the south pole (yellow) to the functionality of the rearranged sequences (productive, unproductive) (IDENTIFICATION concept), and those of the east pole (green) include the labels (DESCRIPTION concept), IMGT unique numbering (NUMEROTATION concept) and the localization and the orientation (ORIENTATION concept) (Fig. 2A).
The IMGT tool diamonds are particularly useful for the IMGT Web service developers, as they allow to control and to enhance the coherence inside and between the IMGT tools in the frame of IMGT-Choreography. Indeed, the comparison of two IMGT tool diamonds allows to identify, among the facets which are "switched on", those relevant to both tools, and then to analyse the expertised concepts which are involved. Thus, in the example in Figure 3, three modules that are relevant to both the IMGT/V-QUEST [63] and IMGT/JunctionAnalysis [64] tools were selected for analysis: "Gene and allele name" (core, red), "IMGT numbering" and "Labels" (east pole, green). IMGT/V-QUEST "Gene and allele name" module (V-GENE, J-GENE and allele names) (Table 6) are the necessary input parameters of the IMGT/JunctionAnalysis "Gene and allele name" module (Table 6, same column) that identifies the D-GENE and allele name (IMGT/JunctionAnalysis output). The output parameters of the IMGT/V-QUEST "IMGT numbering" module (identification of 2nd-CYS at position 104, and J-PHE or J-TRP at position 118) are the necessary input parameters of the IMGT/JunctionAnalysis "IMGT numbering" module that characterizes the CDR3-IMGT length and numbering (Table 6). In the same way, the output parameters of the IMGT/V-QUEST "Labels" module (3'V-REGION and 5'J-REGION) are the necessary input parameters of the IMGT/JunctionAnalysis "Labels" module that identifies the P, N and D-REGION (Table 6) [64].
|
Figure 3: IMGT tool diamond profiles of the IMGT/V-QUEST [63] and IMGT/JunctionAnalysis [64] sequence analysis tools. A. IMGT/V-QUEST. B. IMGT/JunctionAnalysis. The output parameters of the IMGT/V-QUEST "Gene and allele name" (core, red), "IMGT numbering" and "Labels" (east pole, green) modules (circled in (A)) are the necessary input parameters of the IMGT/JunctionAnalysis "Gene and allele name", "IMGT numbering" and "Labels" modules, respectively (circled in (B)). Note that, in contrast to IMGT/JunctionAnalysis, IMGT/V-QUEST does not require input parameters for these modules (empty facets). |
| Table 6: | IMGT/V-QUEST output and IMGT/JunctionAnalysis input/output. |
| IMGT-ONTOLOGY concept | IMGT tool diamond module | IMGT/V-QUEST module output parameters IMGT/JunctionAnalysis module input parameters | IMGT/JunctionAnalysis module output parameters |
| CLASSIFICATION | Gene and allele name | V-GENE, J-GENE and allele name | D-GENE and allele name |
| NUMEROTATION | IMGT Numbering | 2nd-CYS 104, J-PHE or J-TRP 108 | CDR3-IMGT length and numbering |
| DESCRIPTION | Labels | 3'V-REGION, 5'J-REGION | P-REGION, N-REGION, D-REGION |
Web services and IMGT-SOA
Web services have been chosen as the means to create dynamic interactions between IMGT databases and tools. The choice of the Web services to be developed in priority is based on the major existing or potential "conversation nodes" detected in the IMGT biological approaches or with the IMGT tool diamonds.
The Web Service paradigm considers as service any application accessible over Internet fulfilling the requirements of interoperability, weak-coupling and platform independence between applications by making extensive use of open standards, based for example on XML, and existing networking protocols. Precisely, Service Oriented Architectures (SOA) use the Web Services Description Language (WSDL) (http://www.w3.org/TR/wsdl) for the description of new services, the Simple Object Access Protocol (SOAP) (http://www.w3.org/TR/soap/) ensures communication between services, and the Universal Description, Discovery and Integration (UDDI) protocol (http://www.uddi.org/about.html) enables applications to quickly, easily, and dynamically find and use Web services over the Internet. However, this framework does not specify the underlying semantics of communications. IMGT-SOA introduces a semantic layer by imposing that messages, that are exchanged between service providers and consumers, be encoded using valid IMGT-ML streams. IMGT-ML can be seen as a kind of Rosetta stone since it extends the ease of interconnection between IMGT Web services. IMGT-ML is the unique language used for both services inputs and outputs. Clients and providers for these services can be written using any SOAP-capable programming language (i. e the SOAP::lite (http://www.soaplite.com/) development library for Perl or webMethods Glue for JAVA) thus facilitating the conversion of legacy applications to services. IMGT Web services are developed using the JAVA programming language and deployed using the Apache Axis (http://ws.apache.org/axis/) Web services development framework. Apache Axis is an implementation of the SOAP submission to W3C.
The IMGT/LIGM-DB Web service is the first Web service currently developed and implemented with Axis. It includes the "queryKnowledge" and "querySeqData" services. The queryKnowledge service provides the lists of instances for the IMGT-ONTOLOGY concepts, for example the list of chain types, functionalities, specificities defined in the IDENTIFICATION concept, the lists of groups and subgroups defined in the CLASSIFICATION concept, or the list of labels defined in the DESCRIPTION concept. The querySeqData service allows the retrieval of any sequence related data, identified, classified, described according to the IMGT concepts, such as the nucleotide sequence, the description labels, the literature references, the metadata, etc. The querySeqData input has the form of an incomplete IMGT-ML data entry (Fig. 4). The given values are used as criteria to query the database. The result is then a list of data entries, in IMGT-ML format, sharing these given values. Other Web services are developed to automatically query IMGT databases and tools.
IMGT-Choreography has for goal to combine and join the IMGT database queries and analysis tools. In order to keep only significant approaches, a rigorous analysis of the scientific standards [1, 2, 73 - 78], of the biologist requests [79, 80, 81, 82] and of the clinician needs [83, 84, 85] has been undertaken in the three main biological approaches: genomics, genetics and structural approaches. The detailed interactions between IMGT components [3, 86, and this paper] are currently being carefully modelled in UML [87].
The design of IMGT-Choreography and the creation of dynamic interactions between the IMGT databases and tools, using the Web services and IMGT-ML, represent novel and major developments of IMGT, the international reference in immunogenetics and immunoinformatics. IMGT-Choreography enhances the dynamic interactions between the IMGT components to answer complex biological and clinical requests.
Since July 1995, IMGT has been available on the Web at http://imgt.cines.fr. IMGT has an exceptional response with more than 140,000 requests a month. The information is of much value to clinicians and biological scientists in general. IMGT databases, tools and Web resources are extensively queried and used by scientists from both academic and industrial laboratories, from very diverse research domains: (i) fundamental and medical research (repertoire analysis of the IG antibody sites and of the TR recognition sites in normal and pathological situations such as autoimmune diseases, infectious diseases, AIDS, leukemias, lymphomas, myelomas), (ii) veterinary research (IG and TR repertoires in farm and wild life species), (iii) genome diversity and genome evolution studies of the adaptive immune responses, (iv) structural evolution of the IgSF and MhcSF proteins, (v) biotechnology related to antibody engineering (single chain Fragment variable (scFv), phage displays, combinatorial libraries, chimeric, humanized and human antibodies), (vi) diagnostics (clonalities, detection and follow up of residual diseases) and (vii) therapeutical approaches (grafts, immunotherapy, vaccinology). By its high quality and its data distribution based on IMGT-ONTOLOGY, IMGT has an important role to play in the development of immunogenetics Web services.
If you use IMGT databases, tools and/or Web resources, please cite [4] and this paper as references, and quote the IMGT Home page URL address, http://imgt.cines.fr.
We are grateful to Mehdi Yousfi Monod, Joumana Jabado-Michaloud and Vincent Nègre for helpful discussion. We thank our "2004" students Nabil Belkebir, Alain Carnec, Nathalie Clavert, Laurent Douchy, Fadhel Fattoum, Valérie Garelle, Guillaume Gauby, Sandra Ghayad, Bertrand Monnier, Elise Parrod, Erwan Rondeau, Fabrice Sarniguet, Thomas Spiesser and Guillaume Tourneur for their motivation. E.D. is the recipient of a doctoral grant from the Ministère de l’Education Nationale, de l'Enseignement Supérieur et de la Recherche (MENESR). K.Q. received a doctoral grant from the MENESR and is currently supported by the Association pour la Recherche sur le Cancer (ARC). O.C. is supported in the frame of the BIOSTIC programme. The ORIEL (Online Research Information Environment for the Life Sciences) project is funded by the European Union IST programme (ST-2001-32688). IMGT is funded by the Centre National de la Recherche Scientifique (CNRS), the MENESR (Université Montpellier II Plan Pluri-Formation, BIOSTIC-LR2004 Région Languedoc-Roussillon and ACI-IMPBIO IMP82-2004).