Impact of integrating clinical and genetic information

M. Dugas1*, C. Schoch2, S. Schnittger2, W. Kern2, T. Haferlach2, D. Messerer1 and K. Überla1




1Department of Medical Informatics, Biometrics and Epidemiology (IBE)
University of Munich
Marchioninistr. 15
D-81377 Munich, Germany
Tel: +49-89-7095-4497
Fax: +49-89-7095-7491
email: dug@ibe.med.uni-muenchen.de
http://www.med.uni-muenchen.de/ibe/mitarbeiter/dugas.html
2Department of Internal Medicine III
University Hospital of Munich, Germany
Marchioninistr. 15
D-81377 Munich, Germany

*To whom correspondence should be addressed







ABSTRACT

To identify molecular markers which are relevant for medicine it is important to integrate clinical and genetic information - in the broadest sense. It is not sufficient to analyse a few patients; for reliable assessment of parameters relevant to diagnostics and therapy large patient collectives must be characterized both with respect to phenotype and genotype. Matching of genetic data like molecular genetics and cytogenetics with clinical data like follow-up, morphological findings and diagnoses involves integration of complex databases.

In the context of a nationwide leukemia research network in Germany we designed an integrated database covering both genetic and clinical data of patients. The system contains follow-up data and relevant laboratory modalities, i.e. cytomorphology, cytogenetics, molecular genetics, FISH, immunphenotyping and gene expression profiling.

So far 10714 cases from 6546 patients treated by 1026 physicians are documented. The data structure consists of up to 888 variables per case. From our experience, integration of clinical and genetic information requires significant efforts - including data protection issues -, but is feasible and improves data quality leading to more reliable research results for the benefit of the patients.

Key words: clinical integration, cytogenetics, molecular genetics, leukemia



INTRODUCTION

Genetic methods have the potential to change medicine fundamentally, however "Obtaining the sequence of the human genome is the end of the beginning" [1].

Leukemia is a model disease for cancer and more and more of its pathogenesis is being understood on a molecular level, therefore special attention is dedicated to genetic parameters like detection of gene rearrangements or mutations by polymerase chain reaction (PCR) and southern blot [5] (molecular genetics), analysis of chromosomes by determination of karyotype and its aberrations (cytogenetics) and fluorescence in-situ hybridization (FISH [6]) as well as gene expression profiling.

From a physician's perspective it is very important to distinguish between surrogate markers and prognostic relevant parameters, which are associated with important medical outcomes like patient's survival time and quality of life.

For statistical analysis clinical and genetic information concerning a particular patient must be linked. In the context of large data sets and many different data sources, integration is a challenging and time-consuming task. Because of methodological problems, especially complexity and dynamics of underlying data models, there are no comprehensive software products on the market.

In the context of a nationwide German leukemia research project we designed a database combining clinical and genetic aspects of this disease which was integrated into the routine workflow of a large clinical laboratory serving as a national reference center.



LABORATORY METHODS

The leukemia laboratory at the University of Munich performs a wide variety of analyses: Cytomorphology, i.e. microscopic analysis of blood and bone marrow cells, cytogenetics, molecular genetics, fluorescence in-situ hybridization, multi-parameter- immunphenotyping (immunologic analysis of cell surface) and gene expression profiling. The general goal of these methods is a precise characterization of leukemic cells in terms of phenotype and genotype, in order to support diagnostic and therapy of leukemia.



COMPUTER SYSTEM

To build an integrated clinical and research database, we applied established Internet tools [2]. A Linux computer (http://www.suse.de) provides an Apache web server (http://www.apache.org) and a PostgreSQL database (http://www.postgresql.org/), which is accessed by means of server-side PERL (http://www.perl.com) programs through a standard web browser at the client side (e.g. Netscape CommunicatorTM or Internet ExplorerTM).

To ensure patient data security the system was protected by a firewall. We applied an iterative software engineering approach to specify the detailed data structure. Regular user meetings were held; after approximately 20 iteration cycles a suitable database structure was defined. A dedicated web tool [3, 4] was applied for rapid implementation of ergonomic, highly adaptive web forms; all programs were generated from templates, i.e. no line of code was programmed manually.

To embed the system into the routine workflow of the laboratory, Microsoft WordTM documents for written reports as well as adhesive labels for probes are generated directly from the database by means of Microsoft WordTM templates which are completed with the appropriate item values.



DATA MANAGEMENT

A very difficult task has been the integration of preexisting records of all laboratory modalites and clinical information covering approximately five years of operation.

Cytogenetics, cytomorphology and FISH data were exported from a WindowsTM-based desktop database application (Cybase® from MetaSystems; built with ParadoxTM) into DBase/XBase-File format (http://www.e-bachmann.dk/docs/xbase.htm). Data was adjusted to the new schema by means of a PERL-program. Records from other modalities (molecular genetics, immunphenotyping) as well as clinical information (follow-up) was provided in form of Microsoft ExcelTM files, which were converted into tab-separated text and transferred to the new database by PERL programs.

The problems associated with matching of data for statistical purposes were one of the main incentives for building the integrated database. To enable patient-specific evaluations surname, first name, date of birth and laboratory number were used as matching criteria.



RESULTS

A multi-user database with a web frontend consisting of the following modules was implemented: patient demographics and follow-up, cytomorphology, cytogenetics, FISH, molecular genetics, immunphenotyping, cDNA arrays and summary report So far the system contains information on 10714 cases from 6546 different patients (May 2001). The leukemia laboratory of the University of Munich acts as a nationwide reference center, therefore patient data from 1026 physicians located at 244 hospitals are available online.

The data structure - including administrative items - consists of 15 tables and altogether 888 (!) variables. For each probe 15 cytogenetic items, 10 PCR-markers, 10 FISH probes, 8 MRD (minimal residual disease) markers, 72 immunphenotype measurements and a gene expression profile can be handled; most parameters can be customized by the user.



DATA ANALYSIS

By integration of the follow-up data from the AML-CG [7] study the prognostic relevance of specific cytogenetic or molecular genetic anomalies could be confirmed [8, 9, 10, 5].
The detection of new chromosomal aberration patterns is supported by a specific program, which parses the karyotype to determine the breakage points.
Due to integration with clinical data a frequency distribution of chromosome alterations ordered by disease can be generated.



DISCUSSION

The rapid progress in the field of molecular biology is a major driving force in the evolvement of bioinformatics. For the time being there is, however, a substantial gap between genomics and proteomics on the one hand and progress in clinical medicine on the other. To identify genetic patterns - in the broadest sense - which are relevant to patients in general, genetic data must be linked with clinical data for a substantial number of patients; data protection and ethical considerations are important issues.

The integration of clinical and scientific documentation is necessary to determine the prognostic relevance of new diagnostic parameters from molecular biology. This process is difficult, because - from a computer science perspective - both clinical medicine as well as genetics are characterized by complex data models with many variables. A close cooperation between medical informatics and bioinformatics is important, as stated by Kohane [11], Altman [12] and Miller [13].

Key success factors are: Interdisciplinary collaboration, intensive fine tuning to build workflow integrated systems and data models that are both precise and flexible.

The integration of clinical and genetic data generates new scientific results [References 8, 9, 5]. Before the integrated database was available, we lost up to 50% of cases when we combined several data sources automatically due to mismatch of patient demographic data and other inconsistencies.

Integration of clinical and genetic information requires significant efforts, but is feasible and improves data quality leading to more reliable research results for the benefit of the patients.



ACKNOWLEDGEMENTS

Supported by a grant from the German Ministry of Education and Research (BMBF), Kompetenznetz: Akute und Chronische Leukämien - 01 GI 9980/6 and by a grant from 'Deutsche José Carreras Stiftung e.V.'


REFERENCES

  1. Collins, F. S, McKusick, V. A. Implications of the Human Genome Project for Medical Science. JAMA 2001; 285:540-544
  2. Marshall, W. W. and Haley, R. W. Use of a Secure Internet Web Site for Collaborative Medical Research. JAMA. 2000; 284:1843-1849
  3. Dugas, M. Clinical applications of Intranet-Technology. in: New Technologies in Hospital Information Systems Vol. 45 (ed. Dudeck J. et al.), IOS Press 1997, pp. 115-118
  4. Dugas, M., Bosch, R., Paulus, R. and Lenz, T. Intranet-based multi-purpose medical records in Orthopedics. Medical Informatics. 1999; 24: 269-275
  5. Schnittger, S., Kinkelin, U., Schoch, C. et al. Screening for MLL tandem duplication in 387 unselected patients with AML identify a prognostically unfavorable subset of AML. Leukemia. 2000; 14:796-804
  6. Haferlach, T., Winkemann, M., Loffler, H. et al. The abnormal eosinophils are part of the leukemic cell population in acute myelomonocytic leukemia with abnormal eosinophils (AML M4Eo) and carry the pericentric inversion 16: a combination of May-Grunwald-Giemsa staining and fluorescence in situ hybridization. Blood. 1996; 87:2459-63
  7. Büchner, T., Hiddemann, W. et al. Double Induction Strategy for Acute Myeloid Leukemia: The Effect of High-Dose Cytarabine With Daunorubicin and 6-Thioguanine: A Randomized Trial by the German AML Cooperative Group. Blood. 1999; 93: 4116-4124
  8. Kern, W., Schoch, C., Haferlach, T. et al. Multivariate analysis of prognostic factors in patients with refractory and relapsed acute myeloid leukemia undergoing sequential high-dose cytosine arabinoside and mitoxantrone (S-HAM) salvage therapy: relevance of cytogenetic abnormalities. Leukemia. 2000; 14:226-31
  9. Schoch, C., Haas, D., Haferlach, T. et al. Fifty-one patients with acute myeloid leukemia and translocation t(8;21)(q22;q22): an additional deletion in 9q is an adverse prognostic factor. Leukemia. 1996; 10:1288-95
  10. Schoch, C., Haferlach, T., Haase, D. et al (2001) Patients with de novo acute myeloid leukaemia and complex karyotype aberrations show a poor prognosis despite intensive treatment: a study of 90 patients. Br J Haematol 112(1):118-126
  11. Kohane, I. S. Bioinformatics and Clinical Informatics - The Imperative to Collaborate. JAMIA. 2000; 7:512-516
  12. Altman, R. B. The Interactions between Clinical Informatics and Bioinformatics. JAMIA. 2000; 7:439-443
  13. Miller, P. L. Opportunities at the Intersection of Bioinformatics and Health Informatics: A Case Study. JAMIA. 2000; 7:431-438