In Silico Biology 8, 0029 (2008); ©2008, Bioinformation Systems e.V.  


MaXlab: A novel application for the cross comparison and integration of biological signatures from microarray studies


Sabah Khalid1, 3, Mohsin Khan1, Chandrasekhar Babu Gorle1, Karl Fraser2, Ping Wang4, Xiaohui Liu2 and Suling Li1*




1 Molecular Immunology/Bioinformatics Group, Microarray Facility, Division of BioSciences, Brunel University, Uxbridge, UB8 3PH, UK
2 Intelligent Data Analysis Group, Department of Information Systems and Computing, Brunel University, Uxbridge, UB8 3PH, UK
3 Medical Oncology Unit, Institute of Cancer, Barts and London, Queen Mary School of Medicine, London, UK
4 Immunology Group, Institute of Cell and Molecular Sciences, Barts and London, Queen Mary School of Medicine, London, UK




* Corresponding author

  Email: Su-ling.li@brunel.ac.uk





Edited by E. Wingender; received December 30, 2007; revised May 15, 2008; accepted July 13, 2008; published July 22, 2008



Abstract

Microarray gene expression datasets are continually being placed in public repositories. As a result, one of the most important emerging challenges is that which enables researchers to take full advantage of such previously accumulated data to discover or validate common genes in similar biological systems. In light of this we have designed the MaXlab software to not only cross-compare available array data from different laboratories but also extract further knowledge from gene expression patterns embedded within published data. More importantly MaXlab offers a flexible and automated solution applicable for microarray technologies including cDNA and Affymetrix gene chips generating expression profiles for common genes with biological significance. We have identified several sets of genes previously unknown to be commonly expressed across studies investigating related biological questions. Among them is the identification of 17 genes involved in the dysregulation of immune tolerance including the crucial transcription factor Egr2. In addition, we have identified 175 genes commonly expressed in basal and luminal breast tumours in response to the chemotherapeutic drug doxorubicin. The universal expression and characterisation of these encouraging genes identified through MaXlab suggests that they may play a common role in the mechanism of disease and hence act as an incentive for further investigation for identifying potential therapeutic targets. Overall, MaXlab is an attractive application for molecular biologists extracting the intersection between microarray datasets together with the gene expression profiles, from which biologists are able to infer further biological insights.

The software together with file formats and additional material is freely available at http://www.immuno-software.org.

Keywords: microarray, cross-comparison, meta-analysis, multi-platform, data-fusion



Introduction

Since the full potential of microarrays has been recognised, the statement "microarray technology allows one to study the mRNA expression of all the genes within a genome simultaneously" [Brown and Botstein, 1999; Lockhart and Winzeler, 2000; Schena et al., 2005] is the introductory sentence of almost every article focussing on gene expression microarrays. Combined with the use of specialised microarray data analysis tools for functional annotation [Khatri et al., 2002; Doniger et al., 2003; Zeeberg et al., 2003; Khalid et al., 2006a, 2006b] such studies expose potential cutting edge relationships between genes and disease phenotypes, which could be of paramount importance for medical advancement. Since the development and extensive use of this powerful technology a large number of gene expression studies have been performed and their results deposited in public repositories such as the Gene Expression Omnibus (GEO) and ArrayExpress. This has led to one of the most challenging tasks involving the development of a methodology to compare, integrate and extract information from multiple datasets in related biological systems. Such combinatorial studies address the hypothesis that selected sets of differential expression signatures share a significant intersection of genes, thus inferring a biological relatedness with respect to the molecular dysregulation underlying the disease.

Whilst recent meta-analyses studies have been carried out in attempts to correlate Affymetrix and cDNA gene expression datasets using statistical techniques [Rhodes et al., 2002, 2004; Choi et al., 2003, 2004; Ghosh et al., 2003; Lee et al., 2004; Wang et al., 2004; Jiang et al., 2004] or the labour-intensive manual literature mining methods [Wahl et al., 1997; Crow and Wohlgemuth, 2003; Crow et al., 2003; Qing and Putterman, 2004; Oertelt et al., 2005], there remains a computational limitation in terms of automating this process. Manual comparisons of gene lists or arrays from multiple experiments to identify a common gene signature are illogical and inefficient. In addition, it is not feasible for one research laboratory to perform microarray experiments of every nature relating to biological questions that are of interest to them. To this end, we have developed the first multi-functional software called MaXlab, which provides a user-friendly automated solution for the molecular biologist to overcome the painstaking task of comparing array studies. MaXlab (microarray data comparison across laboratories) employs the meta-analytic principle but more importantly offers effective exploration through the combined comparison of global gene expression datasets and relative gene expression analysis for autonomous microarray studies. More specifically, MaXlab can: compare several biologically significant gene lists (pre-defined by the author); find the intersection between entire arrays based on a single user-defined expression threshold across all experiments or unique thresholds for each experiment and lastly compare time series array experiments presenting genes expressed above or below user defined thresholds across multiple time-points. The resulting commonalities in the expression of genes across related studies can increase the confidence that genes identified as having a significant role within a disease or in response to a treatment are not by chance alone. This in turn provides more reliable biological insight into the genes and pathways that may be shared in the underlying molecular dysregulation and ultimately common drug targets among related disease states.



Implementation

The MaXlab software has been designed and implemented using the programming language Visual Basic.Net, MySQL and ActivePerl. Processed data from MaXlab is presented on a multiple panel graphical user interface (GUI) displaying the results obtained from each set functionality procedure combined with a graphical output. The GUI is user-friendly interacting with users via menus, mouse clicks and user-input dialogs and can be utilised not only by researchers actively involved within microarray research but also those working in biological research in general.


System architecture

MaXlab offers two routes for meta-profiling - one for MySQL users and one that is more user-friendly for biologists with little programming knowledge. There are four main functions offered by MaXlab. The first is to compare biologically meaningful gene lists without any threshold selection. The second is to compare multiple arrays using a single threshold value across all experiments. The third function compares multiple arrays using multiple threshold values - one for each individual experiment, and the final function compares time series experiments in their entirety or based on a chosen threshold. Data processed using MaXlab is presented on the GUI in a format that provides the user with a flexible and intuitive view of their manipulated data that is easy to interpret (Fig. 1).



Click on the thumbnail to enlarge the picture

Click on the thumbnail to enlarge the picture
Figure 1: Functionality of the MaXlab software. The MaXlab software comprises several stages during the comparison of microarray gene expression data. The software has the flexibility to allow users to import interesting gene lists as provided by the authors or provide entire microarray gene sets from which the software can generate gene lists of interest to compare, based on user defined thresholds using the defined procedures. The results are subsequently displayed on the graphical user interface and also exported to Excel for visualisation.


Data collection, processing, and storage

Microarray datasets used for this study (Tab. 1) were downloaded from public web sites (Gene Expression Omnibus, http://www.ncbi.nlm.nih.gov/geo/, or ArrayExpress, http://www.ebi.ac.uk/arrayexpress/) or provided by the authors upon request. The data consisted of two general types, two channel ratio data (for cDNA arrays) and single channel intensity data (for Affymetrix arrays), and were usually provided in single composite file format. All available gene identifiers were included in the analysis ensuring that the gene identifiers were of the same type prior to computing the similarity. Significantly expressed gene datasets were obtained from searching the appropriate literature, extracting pre-processed gene expression sets considered important to the researchers biological query.


Table 1: Information of the microarray datasets used in this study
Reference Array Name Genes in Gene Set Array ID Interesting Genes Methodology
Baechler et al., 2003 Affymetrix U95A Human GeneChip 10260 GPL91 (GEO) 286 Affymetrix
Der et al., 1998 Hu6800 GeneChip aka HuGene FL Genome Array 6325 GPL80 (GEO) 171 Affymetrix
Greenberg et al., 2005 Affymetrix U133A Human GeneChip 16000 GPL96 (GEO) 26 Affymetrix
Tezak et al., 2002 Affymetrix HuFL GeneChip 5600 GPL80 (GEO) 125 Affymetrix
Anderson et al., 2006 RFCGR HGMPMouse MmSGC Av1 9216 A-MEXP-165 (ArrayExpress) 2345 cDNA
Safford et al., 2005 Affymetrix mouse GeneChip, MU74A, MU74B, MU74C 33773 GPL81, GPL82, GPL83 (GEO) 60 Affymetrix
Anderson et al., 2006 RFCGR HGMPMouse MmSGC Av1 9216 A-MEXP-165 (ArrayExpress) User defined >1.0 cDNA
Safford et al., 2005 Affymetrix mouse GeneChip, MU74A 12422 GPL81 (GEO) User defined >1.0 Affymetrix
Troester et al., 2004 UNC compugen oligo array for toxgenomics study 20163 GPL550 (GEO) Basal cell line User defined >1.0 cDNA
Troester et al., 2004 UNC compugen oligo array for toxgenomics study 20163 GPL550 (GEO) Luminal cell line User defined >1.0 cDNA


Implementation of the meta-profiling procedures

The output from each function provided by MaXlab generates an intersection of genes common to all datasets together with their gene expression profiles in a graphical format. Each function adopts a different methodology for identifying the common genes.


The comparison of interesting genes from microarray experiments retrieved from published literature

Most often, following a microarray experiment, the genes that are significantly expressed and of most interest to the researcher's underlying biological question, will be published in the full text or supplementary information of the corresponding article. Thus, this aspect of the software is important for the comparison of these genes that are considered biologically significant by the researcher. More importantly, this functionality aims to identify the common genes that are of increased biological relevance to each researcher's investigation. Due to the use of the most interesting gene lists published in the full text of an article the data will already have been pre-processed and the gene expression levels averaged for duplicate genes. Following the import of the significantly expressed genes the procedure below is executed to analyse the commonality between the gene lists. The algorithm continues until all the gene lists have been compared, presenting one final dataset together with the gene expression values corresponding to each gene list provided. The pseudo-code for the method is as follows:


Inputs

Int gene set 1, Int gene set 2 (Maximum of 10)

Outputs

Gene list B

Algorithm

1.	Function: Interesting gene similarity (Int gene set 1, Int gene set 2, Int gene set 3: Gene list A, Gene list B)
2.	For each item (n) in Int gene set 1
3.		If Int gene set 1 (n) = Int gene set 2 (j)
4.			Gene list A (n) = Int gene set 1 (n)
5.		End if
6.	Next Item

7.	For each item (n) in Gene list A (n)
8.		If Gene list A (n) = Int gene set 3(n)
9.			Gene list B = Gene list A (n)
10.		End if
11.	Next Item

12.	Continue until function completed for the total number of interesting gene sets imported 
13. 	End Function


Multiple experiment comparisons based on user defined thresholds

In addition to comparing significantly expressed genes from the literature, the software is able to compare array chips from different laboratories where the interesting genes are defined based on the user's threshold. Via the automated generation of interesting genes from the entire array gene set, potentially new genes that are common can be extracted from the microarray experiments. MaXlab prompts the user for a single threshold for all experiments or a unique threshold for each experiment above which genes are compared to generate a final set of common genes across all related biological experiments. Such threshold values can be based on those published within the corresponding literature.


Combining array chip enrichment with interesting gene list comparison

Through the comparison of the interesting gene sets alone, there may be very few or even a lack of common genes. Thus one may ask if this is a result of the use of entirely different array chips or due to the difference in the cut-off threshold for selecting significantly expressed genes. Therefore we provide users with the option to compare the array gene sets used by each laboratory to identify the consensus genes. Genes in the first dataset are compared with this consensus to find the matching genes (output 1). Genes in the second dataset are also compared with the consensus to identify the matching genes (output 2). The outputs 1 and 2 are matched to find an intersection that is based on the commonality between the array platforms provided.


Identifying common gene expression profiles for time series experiments

The final functionality set of the software is designed for time series microarray experiments accepting time-points or conditions for two array chips used by the same or different research laboratories. Following the import of data the underlying algorithm prompts the user to enter the number of time-points corresponding to each array chip. Using this strategy the genes above the thresholds in the chosen time point for each array will be compared to identify common genes and expression patterns. Once again, the array chips can consist of duplicate genes for which the associated expression values (e.g. median of ratio) will be averaged.

The common genes that that overlap between the experiments are displayed in the common gene expression (CoGeEx) panel together with their gene descriptions and gene expression values as provided by the user. These results are automatically exported and displayed in a graphical format together with Pearson correlation coefficient, F-test and standard deviation statistics representing the correlations between the gene expression patterns. Currently, it is essential that gene identifiers from different array chips are of the same type or manipulated using tools such as MatchMiner (http://discover.nci.nih.gov/matchminer/index.jsp) or the Synergizer (http://llama.med.harvard.edu/cgi/synergizer/translate) that facilitates the conversion of gene identifiers. However, we shall incorporate such a function within MaXlab in version 2.0 to facilitate the process of gene id conversions.


Generation of the interactome of the genes with common gene expression profiles

The network analysis within this study was carried out using the Ingenuity pathways knowledge base to further identify the interactions between the significantly differentially expressed common genes identified from the experimental datasets showing similar gene expression profiles from the related biological studies. (http://www.ingenuity.com/products/pathways_knowledge.html).



Results and discussion

Researchers are intrigued to further associate the genes of significant interest generated as a result of their own microarray experiment with those of other laboratories. Although this is possible for a few genes via literature mining methods it is not a practical solution for genes that are derived via microarray methods where the genes of interest can be numerous. Through the development of MaXlab, offering a solution for the comparative analysis of multiple studies, this becomes possible. Of great importance in working with this data is the realisation that different experiments are typically designed to address different questions. In general, it will only make sense to combine datasets if the questions are the same, or, if some aspects of the experiments are sufficiently similar that one can hope to make better inference from the whole than from the experiments separately. To demonstrate the functionality of our novel MaXlab software we collected data from several microarray experiments published on ArrayExpress or GEO investigating a variety of diseases (Tab. 1).


Identification of common gene expression patterns amongst immunological disorders

Genome-wide profiling has been applied to the field of immunology to examine the perturbations and decipher key cellular or molecular pathways associated with specific diseases [Davidson and Diamond, 2001; Rus et al., 2002; Bennett et al., 2003; Matos et al., 2004; Poirot et al., 2004; Adarichev et al., 2005]. Using MaXlab we have carried out several comparisons to ascertain the similarities between related studies investigating immunological diseases.


Comparison of microarray data for autoimmune disease

We compared the interesting gene expression results provided by two research groups [Der et al., 1998; Baechler et al., 2003] investigating the molecular intricacies of the interferon (IFN) pathway underlying systemic lupus erythematosus (SLE) following interferon treatments, to assess the coherency of the findings between studies and thus ultimately identify common sets of differentially expressed genes regulated by IFN. Following the comparative evaluation of these datasets using MaXlab we identified 34 genes common to both microarray investigations with highly similar gene expression patterns (Tab. 2a, Fig. 2; see also Supplemental Table). Our results display a remarkable similarity in the gene expression profile generated from the data of both research labs strongly supporting the significance of the IFN pathway and the control of the IFN-α gene in the regulation of numerous genes involved in SLE. Amongst these, 14 genes have previously been reported to be differentially expressed in SLE and agree with our findings, one of which is the interferon-induced protein (IFIT1) possessing translation regulatory activity, which was one of the first genes to be associated with SLE [Ye et al., 2003]. Importantly, amongst these are the known IFN-α regulated genes: OAS1, MXA, MXB, STAT1 and ISGF3 [Aebi et al., 1989].


Table 2a: Summary of the results generated using MaXlab
Gene Set similarity (%)* 5162 (31.1%) 5285 (24.5%) 6239 (14.5%)
No. of interesting genes
common to both gene sets
from each study**
Baechler et al., 2003: 198 Greenberg et al., 2005: 11 Anderson et al., 2006: 1486
Der et al., 1998: 113 Tezak et al., 2002: 106 Safford et al., 2005: 60
No. of significant genes
commonly expressed across studies
34 5 17
* 5162 represents the number of genes that are common to both arrays. The value 31.1% represents the number of common genes shown as a percentage of the total number of genes present on both arrays (5162/(Array 1 + Array 2)) x 100
** The interesting expression dataset consists of 198 and 113 significantly expressed genes as obtained from the literature from which 34 are common to both arrays (5162).

Table 2b: Comparing interesting gene lists generated from the microarray gene expression sets based on user-defined thresholds.
Reference Genes in Gene Set Threshold Common genes
Anderson et al., 2006 9216 User defined >1.0 240
Safford et al., 2005 12422 User defined >1.0
Troester et al., 2004 20163 Basal cell line User defined >1.0 at 36 hours 175
Troester et al., 2004 20163 Luminal cell line User defined >1.0 at 36 hours



Click on the thumbnail to enlarge the picture
Figure 2: Gene expression profiles of 34 genes significantly expressed in response to interferon-α in association with systemic lupus erythematosus (SLE). Genes regulated by interferon-alpha, which are involved within the interferon pathway, play an important role within the pathogenesis of SLE. The expression profiles of 34 co-expressed genes from the genes considered meaningful to both studies were highly similar in response to interferon-α stimulation including the genes MXA, MXB and STAT1.

Furthermore the common genes identified through our comparative analysis were also among those reported in the Bennett and colleagues study in 2003 (IFI44, MX1, MX2, PLSCR1 and TAP1) and by Crow et al., 2003 (IFI44, MX1, G1P3, PLSCR1 and G1P2) thus confirming the importance of these genes within the SLE signature and strongly suggesting that IFN-α is crucial in disease progression. Intriguingly, our results identify the genes NMI (N-MYC and STAT interactor) and SP110 whose roles have not previously been clarified in SLE to be commonly over expressed in response to IFN-α in both datasets. This in turn strongly suggests potential genes for further investigation, especially since they interact with STAT1 and IL6, respectively (Fig. 3). Similarly, MaXlab identified several other common genes between the studies including IRF2, PML, PMAIP1 and FAS that have not previously been associated with SLE and thus their roles within SLE have not been elucidated. However, the common over expression of the genes IRF2 and PML involved in the negative regulation of transcription and cell proliferation and PMAIP1 and FAS playing a functional role in the induction of apoptosis (http://www.geneontology.org), suggest potential target genes for further examination to clarify their roles in the pathogenesis of SLE (Supplemental Table).



Click on the thumbnail to enlarge the picture
Figure 3: Significantly expressed common gene interactions after interferon-α stimulation in association with systemic lupus erythematosus (SLE). From the genes highly expressed and common to both studies investigating SLE the over expressed known interferon-α regulated STAT1 gene appears to play a key role in the regulation of several other genes. Furthermore IRF2 and NMI have not been reported to have association with SLE in previous expression profiling studies and thus can be regarded as potential genes for further investigation to discover their potential role within the pathogenesis of SLE.

Other disorders that have been modelled as autoimmune diseases whose pathophysiology is not fully understood are dermatomyositis (DM) and juvenile dermatomyositis (JDM). To examine the correlation between significantly expressed genes in both DM and JDM we carried out a comparative analysis of two related expression studies [Tezak et al., 2002; Greenberg et al., 2005] to infer further biological meaning to understand the mechanisms involved in the pathogenesis of DM and JDM. We identified a set of genes including the interferon-α (type 1) inducible genes MXA, MXB, IFI27 and IFI44 and the interferon regulatory factor gene IRF7 thus confirming their importance within the biological pathways in both DM and JDM [Der et al., 1998; Sato et al., 1998; Greenberg et al., 2005] (Fig. 4; see also Supplemental Table, part B). What is striking about the genes identified through our software is that they are analogous to those commonly identified from comparing the SLE studies (Supplemental Table). Thus we compared SLE and DM and JDM studies [Der et al., 1998; Baechler et al., 2003; Greenberg et al., 2005], which interestingly revealed the common over-expression of GIP2, GIP3, PLSCR1 and OAS1. The common expression of these significant genes across the autoimmune diseases DM, JDM and SLE using MaXlab combined with their known involvement within the IFN pathway [Bennett et al., 2003; Crow et al., 2003; Ishii et al., 2005] suggests that these diseases share a common pathophysiology.



Click on the thumbnail to enlarge the picture
Figure 4: Biologically meaningful genes with related gene expression patterns expressed in the autoimmune disease dermatomyositis. The comparison of two autoimmune diseases, dermatomyositis and juvenile dermatomyositis revealed the common expression profile of 5 genes significantly expressed. The genes identified that are involved within the interferon pathway within SLE and furthermore up regulated in response to interferon-α stimulation.


Comparing immuno-tolerance microarray studies

Numerous studies have been conducted in which anergic cell states have been induced to identify the mechanisms that lead to the dysregulation of tolerance [Ibrahim et al., 2001; Lock et al., 2002; Matejuk et al., 2003; Zhang et al., 2003]. To identify the genes and pathways that promote the induction of T-cell anergy, Safford et al., 2005, carried out a microarray analysis on T-cells activated in conditions that either promote or inhibit anergy induction. In addition, a similar study by Anderson et al., 2006, exploited cDNA microarray technology to demonstrate a balanced transcription program regulated by different transcription factors for T-cell activation and/or tolerance during antigen induced T-cell responses. When assessing the commonality our software was able to reveal the role for 17 genes common to both tolerant conditions, including that of zinc finger transcription factor early growth response gene 2 (Egr2), a gene required for the full induction of T-cell anergy [Harris et al., 2004], alongside the genes of transcription factors Irf4, Jarid2 and Nfatc1 as well as the chemokines or cytokines Tnfsf11, Tnfsf9 and Ccl1 (Tab. 2a, Fig. 5; see also Supplemental Table, part C). The identification of the principal Egr2 gene and its role as a negative regulator of T-cell function and thus anergy induction has been further supported by several studies using high-dimension genomic analysis to examine the genes upregulated during both T and B-cell anergy [Glynne et al., 2000; Macián et al., 2000; Lechner et al., 2001].



Click on the thumbnail to enlarge the picture
Figure 5: The gene expression pattern of 17 statistically valuable genes with potential functionalities in the molecular pathways underlying T cell anergy. In response to anergy induction, both studies compared commonly expressed 17 genes that were considered biologically meaningful to the underlying experimental question. These genes included the zinc finger transcription factor early growth response gene 2 (Egr2) that possesses a role as a negative regulator of T cell function and is required for the full induction of T cell anergy alongside the chemokine or cytokine genes Tnfsf11, Tnfsf9 and Ccl1.


To demonstrate the comparability function of MaXlab for whole microarrays based on user-defined thresholds we exploited the entire microarray gene sets used by Safford et al., 2005, and Anderson et al., 2006, and a cut-off expression threshold of 1.0 fold. As a result, MaXlab revealed a total of 240 genes common to both studies (Tab. 2b; see also Supplemental Table, part D and Supplemental Figure 1). Several genes were found with a potential for further investigation, including Irf8 (a negative regulator of cell proliferation), Tgfb1 and JunB (possessing transcriptional regulatory activity), Prkca (a negative regulator of protein kinase activity) and Ptprv (an inducer of apoptosis) (http://www.geneontology.org). More importantly, Jak2 involved in the Jak-Stat cascade known as a negative regulator of cell proliferation, Casp3 that is more specifically a negative regulator of activated T cell proliferation, Cdkn1a (cyclin dependent protein kinase inhibitor activity) and Cdkn2b (regulator of transcription) have also been commonly identified. These genes have not been shown to have an involvement in tolerance. However their common expression in both studies and investigation into their functional activities suggest them to be prospective targets for further investigation to elucidate their potential involvement in initiating or maintaining T cell tolerance.


Comparing time series based microarray data for breast cancer

Often microarray experiments that are carried out are based on time series. Here we have used MaXlab to further explore a gene expression microarray experiment carried out by Troester et al., 2004, investigating the response of basal and luminal breast tumours to the drugs, doxorubicin and 5-fluorouracil. One aspect that may be of potential importance is identifying the genes that are commonly and significantly expressed in cells from various cancers in response to a particular drug. Alternatively, it may be equally valuable to know the common genes that are expressed in one particular type of cancer that has been treated with several drugs. To demonstrate the microarray time series functionality of MaXlab, as an example we chose to identify genes commonly expressed in both basal and luminal cell lines following treatment with doxorubicin above a gene expression threshold of 1.0 at 36 hours. This revealed 175 genes that are expressed in both cell lines derived from basal and luminal epithelium in response to doxorubicin thus revealing potentially common targets for this drug (Tab. 2b; see also Supplemental Table, part E and Supplemental Figure 2). Amongst these, were those discussed by Troester et al., 2004, including the p53 regulated gene TP53I3 (tumour protein p53 inducible protein 3) involved in the induction of apoptosis, Cdkn1a involved in the negative regulation of cell proliferation and induction of apoptosis, FDXR (ferredoxin reductase) and also glutathione-S-transferase π (GST-π) that were induced in both cell lines, although less dramatically in the luminal cell line. Other genes of potential interest commonly expressed included CTSO (cathepsin) involved in proteolysis, S100A9 involved in leukocyte chemotaxis, BBC3 involved in caspase activation and positive regulation of apoptosis (although much higher in the luminal cell line) and LRDD involved in death receptor binding (http://www.geneontology.org). Through the cross comparison of multiple studies, MaXlab can provide researchers with an insight into the genes playing a potential common role in related diseases in an automated fashion via several flexible functionalities and view potentially important disease signatures.




Conclusion

In conclusion, we believe that the MaXlab software is an attractive and powerful application for the scientific community involved in microarray research allowing researchers to gain knowledge from existing datasets, the majority of which sit stagnant and disjointed following publication. Following the systematic collection of public microarray data, we have demonstrated the explorative functionality of MaXlab for the comparative meta-profiling of biologically relevant datasets generated by independent research labs. By integrating related gene expression matrices we identified several sets of common genes from related studies significantly expressed and more importantly possessing similar expression profiles. More interestingly, our software has also been able to determine several commonly expressed genes of high significance based on expression or gene function across related biological conditions that have not been associated with the disease before. The universal expression and characterisation of these encouraging genes suggests that they may play a common role in the mechanism of disease and are hence possible genes worthy of further investigation and could serve as potential therapeutic targets.



Availability and requirements

Project name: MaXlab
Project Home Page: Databases including the software executable can be accessed from http://www.immuno-software.org.
Operating system: Tested on Windows 2000 Workstation (SP4) and Windows XP (SP24)
Programming language: Microsoft Visual Basic.Net and MySQL and ActivePerl
Other requirements: Microsoft .NET Framework version 2.0 Software Development Kit (SDK) min, MySQL database server no later than 4.1, MySQL Connector/ODBC 3.51 and Microsoft Office 2000



Acknowledgements

This study was partly supported by grants from the UK Medical Research Council (MRC) (Grant number: G0300520) and the Brunel University Studentship. We thank Ingenuity Systems for allowing us to use their Ingenuity pathways knowledge base.




References