Designing GENESYS - a software system for mining chip data
Änne Glass and Lothar Gierl
University of Rostock, Faculty of Medicine,
Institute for Medical Informatics and Biometry
Rembrandt-Str. 16/17
D - 18055 Rostock
Phone.: ++49.(0)381.494.7310
Fax: ++49.(0)381.494.7203
Email: aenne.glass|lothar.gierl@medizin.uni-rostock.de
Most of diseases are caused by a set of gene defects, which occur in a complex association. The association scheme of expressed genes can be modelled by genetic networks. Genetic networks are efficiently facilities to understand the dynamic of pathogenic processes by modelling molecular reality of cell conditions. In this sense a genetic network consists of first, a set of genes of specified cells, tissues or species and second, causal functional relations between these genes determining the functional condition of the biological system, i.e. under disease. A functional relation between two genes will exist if they both are directly or indirectly associated with disease [Oliver, 2000]. Our goal is to characterize diseases by genetic networks generated by a computer system. We want to introduce this practice as a bioinformatic approach for finding targets.
We are working on the computer system GENESYS which will allow the import and analysis of gene expression data for generating and presenting genetic networks. In this paper we mainly address the design of GENESYS and furthermore the problem of generating genetic networks applying methods of artificial intelligence (AI).
The four main components of our system are (1) an import tool for expression data, (2) a parser engine for automatic mining gene function information from internet databases , (3) an expression data analysing tool using AI-methods and (4) a visualization tool for presenting genetic networks [Glass 2000].
- Gene expression data from micro array experiments (cDNA chip technology) are provided by research-project "BMBF-Leitprojektverbund Proteom-Analyse des Menschen". They are validated and standardized for import into GENESYS by a filter tool.
- A parser engine mines gene function information from internet databases (e.g. GeNet, PIR, DIP). It consists of three sequential working components: First a database adapter connects to internet database, queries the data and stores all query results locally on computer. After processing the adapter a parser tool analyses local stored information for well-defined data. In the last step a filter tool prepares resulting data for import into GENESYS by validation and standardization.
- An artificial neural network is utilized for classifying diseases to specific diagnostic categories based on their gene expression signatures. We chose a neural network of adaptive resonance theory (ART). An ART net works like a self-organizing neural pattern recognition machine. The five major properties of the ART system are plasticity as well as stability, furthermore sensitivity to novelty, attentional mechanisms and complexity. The network architecture of type ART1 self-organizes and self-stabilizes its recognition codes and categorises arbitrarily many and arbitrarily complex binary input patterns [Carpenter and Grossberg 1987]. We obtain the input patterns for ART1 from gene expression raw data of different samples of the same disease by using binary coding. As result of ART1 analysis we get a specific pattern of together expressed genes, which shall be deemed to be typical in general for considered disease.
In addition to ART1 we apply AI-methods of case-based-reasoning. As the technique of case-based-reasoning has been practised successfully in several domains like diagnostics, prediction, control and planning [Heindl et al., 1997], [Schmidt et al.,1997] we want to utilize this technique for incremental modelling genetic networks. Each genetic network is considered as a case within the human genome. Similar cases represent similar genetic networks. Each stored identified case in the case base facilitates the retrieval of furthermore cases, i.e. genetic networks. The single cases have to be induced qualified for retrieving similar cases very fast and for integrating new cases into the case base, respectively. Inconsistence and incompleteness are characteristic features of genetic networks in consequence of incremental steady increase of knowledge about the human proteome. As a result the revise-phase is particularly important within the retrieval-reuse-revise-retain-loop of case-based-reasoning systems to control and revise the case base permanently. For this task a set of practicable techniques of our previous work [Gierl et al. 1998] and according to the international level of research are available (e.g. contrast model by Tversky) [Aamodt and Plaza, 1994], [Tversky, 1977]. In this way we obtain a similarity tree [Steffen et al., 2000] of prototypes of genetic networks of different diseases.
- These networks are presented in 3D structure by a visualization tool, which is developed in Inprise Delphi integrating the technology of OpenGL. Genes are presented as globes with expression labels or identifiers of different internet databases to be chosen optionally. Genes will be linked if they are interacting functionally. In future we will develop interactive components for users to choose a set of interacting genes and zoom into the genetic network.
First results of utilizing several components of GENESYS separately are available. The system as a complex working software architecture will facilitate deciding diagnosis and therapy on the base of genomic knowledge and discovering targets for drugs. Conventional methods of clustering excepting biological background knowledge don't suffice for that purpose
REFERENCES
-
Aamodt A, Plaza E (1994) Case-based reasoning: Foundational issues, methodological variations and system approaches. AICOM 7(1): 39-59
-
Carpenter GA, Grossberg S (1987) A massively parallel architecture for a self organizing neural pattern recognition machine. Computer Vision, Graphics, and Image processing 37: 54-115
-
Gierl L, Bull M, Schmidt R (1998) CBR in Medicine. In: Bartsch-Spörl B, Wess S, Burkhard H-D, Lenz M (Hrsg.) Case-Based Reasoning Technology - from Foundation to Applications. Springer-Verlag Berlin: 273-297
-
Glass Ä (2000) A bioinformatic approach for generating genetic networks. Biosystems and Medical Technology 2: 52
-
Heindl B, Schmidt R, Schmid G, Haller M, Pfaller P, Gierl, L, Pollwein B (1997) A Case-Based Consiliarius for Therapy Recommendation (ICONS): Computer-Based Advice for Calculated Antibiotic Therapy in Intensive Care Medicine. Computer Methods and Programs in Biomedicine 52: 117-127
-
Oliver S (2000) Guilt-by-association goes global. Nature 403: 601-603
-
Schmidt R, Heindl B, Pollwein B, Gierl L (1997) Multiparametric Time Course Prognoses by Means of Case-based Reasoning and Abstractions of Data and Time. Medical Informatics 22: 237-250
-
Steffen D, Ihracky D, Gierl L (2000) Similarity Trees for Zinc Finger Protein Design. In: Miyano S, Shamir R, Takagi T (Eds.) Currents in Computational Molecular Biology. Universal Academy Press Tokio: 196-197
-
Tversky A (1977) Features of Similarity. Psychological Review 84:327-352