Visualization and analysis of gene expression data using freeware and GPL-Software.

E. W. Wolski1, C. Wierling1, T. Kreitler2, S.Kielbasa3, U. Schneider1, J. Gobom1, H. Lehrach1 and H. Eickhoff1




1 MPI of Molecular Genetics Ihnestr.
73 D-14195 Berlin
2 Max Delbrück Center for Molecular Medicine
Robert-Rössle-Str. 10
13092 Berlin
3 Innovationskolleg Theoretische Biologie Humboldt-Universität zu Berlin
Invalidenstr. 43
D - 10 115 Berlin, Germany







INTRODUCTION

The analysis of Gene Expression Profiles generated by series of complex hybridizations is an important tool in molecular genetics [1]. Although the method of hybridizing complex probes to DNA Arrays is now established in a variety of laboratories, the reliability of the data produced and the reproducibility of the experiments are major challenges for the future of this highly parallel analysis method.

The experiment design implies a compromise between detection sensitivity and quantitative reproducibility. On one hand there is the goal to be able to perform the analysis with a small amount of RNA sample. On the other hand, the obtained data should be reproducible, with a statistical significance below a twofold up- or downregulation of a particular gene of interest [2, 3]. The experimental design is furthermore influenced by the libraries available for spotting, the technology of RNA labeling and the type of the matrix.

Here we present an experiment design optimized for hybridization experiments on nylon membranes containing normalized cDNA libraries that fulfills the above criterion. Array experiments generally have at least four dimensions (x-location, y-location, intensity, and experiment). Here we present a data analysis and visualization toolbox, constructed from the freeware Visual-Grid [4] and Cluster [5] and free software like Perl, mySQL and R [6].

Keywords: Complex Hybridization, Data Visualization, Experiment Design.


METHOD

In our experiments the Human Unigene Library no. 952 (HUL 952) obtained from "Deutsches Resourcenzentrum für Genomforschung GmbH" [7] with 33k clones was spotted onto nylon membranes. This library was printed in four parts on Hybond N+ Nylon membranes of the size 22.4x11.2 cm. On each membrane, 8448 PCR products were deposited in triplicate with a maximum possible independence between spots. The membranes where hybridized with 150 ng of poly A+ RNA labeled with 33P.

The amount of bound RNA was determined by exposing the membranes on phosphor screens (Fuji) with varying exposition times. For image analysis of the set of 60 hybridization images, the Visual Grid (TM) image analysis software was used. The data was stored in an in-house developed mySQL database with Perl and Perl-CGI interfaces. The tasks performed on the database were to calculate the median, mean of the 3 spots from one membrane, subtracting the background from the spot intensities, selecting for clones with significant intensities and linking these intensities with gene annotations and experimental information.

Using the R built-in functions, new functions and the R-mySQL interface, we covered the main part of the data-analysis, i.e., normalization of all intensities by the mean intensity of the Membrane, calculating the mean of two expositions, normalization by the mean of the rank, normalization by a virtual control sample, selecting differently expressed clones and estimating of the experiment reproducibility and sensitivity. In addition, hierarchical clustering was performed using the clustering software Cluster [5].

Methods and Tools were developed for expressive data visualization. Correlation coefficients can be color-coded and plotted on a 2D Image as a quality control. Color-coded intensities of one membrane or intensity ratios from two experiments, sorted by membrane position and microtiterplate can help determine reproducibility and reveal sources of errors. Visualization of normalization effects and selection procedures can be performed by using color-coded scatter plots and 2D maps to emphazise the changes.


RESULT

The developed strategy was applied to the expression analysis of six RNA samples from human cell lines, and evaluated with respect to detection sensitivity and reproducibility of expression ratio determination [8]. The experiment set-up, from filter design to statistical data analysis was proven to be resistant to experimental variations, e.g., changes in spot diameter, hybridization conditions, and hybridization artifacts. Further information can be found at: http://www.molgen.mpg.de/~wolski/expression.


REFERENCES

  1. Eickhoff, H., Schuchhardt, J., Ivanov, I., Meier-Ewert, S., O'Brien, J., Malik, A., Tandon, N., Wolski, E. W., Rohlfs, E., Nyarsik, L., Reinhardt, R., Nietfeld, W. and Lehrach, H. Genome Res. 2000 Aug;10(8):1230-40.
  2. Schuchhardt, J., Beule, D., Malik, A., Wolski, E., Eickhoff, H., Lehrach, H. and Herzel, H. Nucleic Acids Res. 2000 May 15;28(10):E47.
  3. Bertucci, F., Houlgatte, R., Benziane, A., Granjeaud, S., Adelaide, J., Tagett, R., Loriod, B., Jacquemier, J., Viens, P., Jordan, B., Birnbaum, D. and Nguyen, C. Hum Mol Genet. 2000 Dec 12;9(20):2981-91.
  4. www.GPC-AG.com
  5. Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8.
  6. Ihaka, R. and Gentleman, R. Journal of Computational and Graphical Statistics, 1996,vol.5, num.3,pages:299-314
  7. www.rzpd.de
  8. Hummel M.(in preparation)