In Silico Biology 9, 0006 (2009); ©2009, Bioinformation Systems e.V.  


Gene set enrichment analyses revealed differences in gene expression patterns between males and females


Wei Zhang1, R. Stephanie Huang1, Shiwei Duan1 and M. Eileen Dolan1,2,3*




1 Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
2 Committee on Clinical Pharmacology and Pharmacogenomics, The University of Chicago, Chicago, IL 60637, USA
3 Cancer Research Center, The University of Chicago, Chicago, IL 60637, USA



* Corresponding author

   Section of Hematology/Oncology, Department of Medicine, 5841 S Maryland Ave. MC 2115, The University of Chicago, Chicago, IL 60637, USA
   Email: edolan@medicine.bsd.uchicago.edu
   Phone: +1-773-702 4441;
   Fax: +1-773-702 0963





Edited by E. Wingender; received July 17, 2008; revised November 06, 2008; accepted December 30, 2008; published January 03, 2009



Abstract

Men and women differ not only in their physical attributes and reproductive functions but also in many other characteristics, including the risks for some diseases as well as response to certain therapeutic treatments. Though genetically-identical for autosomal chromosomes, males and females could have gender-specific transcriptional or translational regulation, leading to differential mRNAs or protein products for some genes. To illustrate the gender-specific differences in mRNA-level expression, we compared gene expression patterns between males and females using a whole-genome microarray dataset on the unrelated HapMap lymphoblastoid cell lines derived from individuals of European (58 individuals) and African (59 individuals) ancestry. We applied the Gene Set Enrichment Analysis to identify any overrepresented predefined gene sets in either men or women. Distinct patterns of upregulation and downregulation of certain chromosomal regions and other gene sets such as targets for certain microRNAs and transcription factors were identified in males or females, suggesting their potential roles in defining the gender-specific phenotypes. Gender-specific patterns of gene expression also appeared to be different between these two populations.

Keywords: gene expression, gender difference, HapMap, lymphoblastoid cell lines, gene set enrichment



Introduction

Men and women are different physically and physiologically in their reproductive systems, hormones, muscle mass and bone structure [1]. When it comes to health issues, gender has also been shown to play an important role. For example, besides gender-specific diseases (e.g. prostate cancer or ovarian cancer), males and females have different susceptibility to many common diseases such as diabetes, heart attack and certain cancers [2 - 4]. Males and females may also respond differently to the same therapeutic treatment [5].

In the past few years, gene expression as a quantitative phenotype has been shown to be varied within the same population [6, 7] and among geographically-separated populations [8]. For males and females, although the genetic information carried in autosomes is identical, there could be gender-specific transcription or translation of genes through mechanisms not based on DNA sequence variation. Differences in gene expression, in turn could contribute to the health disparities between males and females. For example, women have more CYP3A [9], a key phase I metabolism enzyme, which modulates the effectiveness of drugs in women. Using a public microarray expression dataset (~4,000 expressed genes), we previously identified a handful of genes (<0.5%) with gender-specific expression in a panel of lymphoblastoid cell lines (LCLs) derived from individuals of European ancestry from Utah, USA, collected by Centre d' Etude du Polymorphisme Humain (CEPH) [10], suggesting that gender differences in gene expression may be modest for the majority of genes at the mRNA level. Therefore, in addition to using a conventional statistical approach (e.g. t-test) to identify individual gender-specific genes, in the present work, we applied the Gene Set Enrichment Analysis (GSEA) [11, 12], which is a knowledge-based approach for interpreting genome-wide expression profiles [13]. GSEA allowed us to identify gender-specific gene sets using the Affymetrix Human Exon 1.0ST array (exon array) dataset that we previously generated on the HapMap (http://www.hapmap.org/) [14] CEU (CEPH people from Utah, USA) and YRI (Yoruba people from Ibadan, Nigeria) samples [15]. We also compared the gender-specific gene sets between the two populations.



Methods


Gene expression data

We previously generated a dataset of gene expression using the Affymetrix GeneChip Human Exon 1.0 ST array on 87 CEU (CEPH people from Utah, USA) and 89 YRI (Yoruba people from Ibadan, Nigeria) samples (Gene Expression Omnibus, GEO accession: GSE7851, http://www.ncbi.nlm.nih.gov/projects/geo/). The details for cell line preparation, RNA isolation, chip hybridization and signal normalization/summarization were described in Zhang et al. [15]. The same set of 9,156 transcript clusters deemed to be reliably expressed [15] was used in this study. To avoid ambiguity and chip cross-hybridization, our analysis set was further limited to the 8,443 transcript clusters with unique gene identity annotations retrieved from the Affymetrix NetAffx website (http://www.affymetrix.com/analysis/index.affx) (NCBI build 36). The unrelated HapMap samples (58 CEU parents: 29 males and 29 females; 59 YRI parents: 29 males and 30 females) were used in further analyses.


Identifying gender-specific genes individually

Two-tail t-test assuming unequal variance was used to compare log2-transformed expression levels of each gene between males and females in each population separately. The nominal P-values were adjusted by the Benjamini-Hochberg (B-H) correction for false discovery rate (FDR) [16] (Padjusted < 0.05).


Gene set enrichment analyses

The GSEA (GSEA Molecular Signatures Database, released December 14, 2007; http://www.broad.mit.edu/gsea/) [11, 12] was used to evaluate the differences in expression patterns between males and females in the unrelated CEU and YRI samples, separately. The GSEA method focuses on gene sets, which are defined as groups of genes that share common biological function, chromosomal location or regulation. The goal of GSEA is to determine whether the members of a gene set S are randomly distributed throughout the entire reference gene list L or primarily found at the top or bottom [11]. We expect that gender-specific gene sets will tend to show a non-random distribution. The GSEA software features more than 3,300 predefined genes sets under four categories: positional gene sets (C1), motif gene sets (C2), curated gene sets (C3) and computational gene sets (C4) [11]. In this study, we tested three categories: C1, C2 and C3. Particularly, only sets of human genes (min. size = 10, max. size = 200) were included in our analyses (permutation number = 1,000). The QVALUE-estimated [17] FDR of 25% was then used to select overrepresented gene sets.



Results

Identifying individual gender-specific genes

At Padjusted < 0.05, eight chromosome X-linked genes were differentially expressed between males and females in the CEU samples. In contrast, 12 gender-specific genes (11 chromosome X-linked and 1 on chromosome 16) were identified in the YRI samples (Fig. 1). Using a looser cutoff (Padjusted < 0.50), three more chromosome X-linked genes (EIF2S3, UBE1, RPS4X) were identified in the CEU samples, while four more chromosome X-linked (PCTK1, ARD1A, FUNDC1, STS) and two more autosomal genes (NUCB2, LOC401127) could be identified in the YRI samples (Supplemental Table 1).



Click on the thumbnail to enlarge the picture
Figure 1: Differentially expressed genes between males and females (Padjusted < 0.05). Eight chromosome X-linked genes were differentially expressed in both populations. Four more genes (3 chromosome X-linked and 1 on chromosome 16) were also differentially expressed in the YRI samples. CEU: CEPH people form Utah, USA; YRI: Yoruba people form Ibadan, Nigeria.


Gene set enrichment analyses

Tab. 1 lists the summary of enriched gene sets. Among the 206 positional gene sets (C1), CHRXP11 and CHRXP22 of chromosome X were enriched with higher expression in both the CEU and YRI females (FDR < 25%). In addition, the CEU females had three more enriched cytobands on autosomes. No enriched cytobands were found in the YRI males, while 16 cytobands were enriched in the CEU males with higher expression. Supplemental Table 2 lists the enriched gender-specific cytobands. Among the 754 motif gene sets (C2), no enriched motifs were found in the CEU males or females. In contrast, two motifs including one for microRNA mir-524 and mir-525 targets were enriched in the YRI males; 15 motifs including transcription factor binding sites and two microRNA targets were enriched in the YRI females. Supplemental Table 3 lists the enriched gender-specific motifs in the YRI samples. Among the 731 curated gene sets (C3), no enriched curated gene sets were found in the male samples. One gene set, "DISTECHE_XINACTIVATED_GENES", was commonly enriched in the females of both populations. In addition, 16 more curated gene sets were enriched in the YRI females. Supplemental Table 4 lists the enriched curated gene sets in females.


Table 1: Summary of enriched gender-specific gene sets (FDR<25%).
Males Females
Population C1 C2 C3 C1 C2 C3
CEU 16 0 0 5 0 1
YRI 0 2 0 2 15 17
C1: Positional gene sets
C2: Motif gene sets
C3: Curated gene sets



Discussion

Gene-expression variation may play a significant role in gender-specific health disparities, probably through upregulating or downregulating genes within physiological pathways. Using a conventional approach (e.g. t-test), the proportion of gender-specific genes appears to be small (<0.5%) in some recent whole-genome studies on LCLs [10] and peripheral leukocytes [18], suggesting that the transcriptional differences could be modest for the majority of genes between genders. Our analysis using the exon array data [15] also supported this observation. Of the >8,400 genes, eight and twelve genes were gender-specific in the CEU and YRI samples, separately (Padjusted < 0.05) (Fig. 1). Among them, only MT1E (metallothionein 1E) in the YRI samples is on an autosome, while the remaining genes are all chromosome X-linked, probably due to the escape of X-inactivation [19]. The overexpression of MT1E in the YRI females (P = 3.81 × 10-5, Padjusted = 0.029) could be an interesting finding, because it has been shown that metallothionein (MT) can be down-regulated by estrogen [20]. Furthermore, MT protein overexpression is frequently encountered in a number of human primary tumors including renal cell carcinoma (RCC) [21] and has been shown to be correlated with disease progression. Therefore, the better survival rate found in female RCC patients could have a molecular basis due to the differences in gene expression.

Still, one drawback associated with individual-gene analysis is that the stringent statistical cutoff for adjusting thousands of multiple tests may result in missing biologically meaningful genes. Since a biological process often affects sets of genes simultaneously, the contribution of an individual gene can be modest, but the combined effect may be synergistic or additive. Therefore, we applied the GSEA method [11] which focuses on the predefined gene sets to evaluate the gender-specific expression patterns.

The positional category C1 is comprised of gene sets corresponding to each cytoband that has at least one gene. These gene sets are helpful in identifying effects related to chromosomal deletions or amplifications, dosage compensation, epigenetic silencing and other regional effects. At a more stringent significance level (Padjusted < 0.10), in both the CEU and YRI samples, two X-linked cytobands: CHRXP11 and CHRXP22 were found overexpressed in females (Supplemental Table 2). Though the enrichment of chromosome X-linked genes suggests the validity of the approach and dataset, it could be due to the escape of X-inactivation, which has been shown to be extensive (~35% X-linked genes) in humans [19]. In addition, three autosomal cytobands: CHR1P36, CHR2Q37 and CHR7P13 were found downregulated in the CEU females. If clustered by gene families, the common gene family among the three cytobands was transcription factors (21 genes on CHR1P36, 4 genes on CHR2Q37 and 1 gene on CHR7P13), suggesting their roles in regulating gender-specific gene expression.

The motif category C2 is comprised of gene sets that represent known or likely regulatory elements in promoters and 3'-UTRs (untranslated regions) [22]. These gene sets make it possible to link changes in a microarray experiment to a conserved, putative cis-regulatory element [11]. For the CEU samples, there were no enriched C2 gene sets in either males or females (Supplemental Table 3). For the YRI males, the genes of the gene set "CDPCR3", defined by genes with promoter regions containing the motif that matches the annotation for CUTL1 (cut-like homeobox 1), a transcription factor, were overexpressed in males. Another overexpressed gene set in the YRI males is comprised of genes targeted by mir-524 and mir-525. The majority of the genes in these two gene sets have transcription factor activities (e.g. 21 of 57 genes in the "CDPCR3" set), indicating the regulatory roles of these particular transcription factors in the overall gender-specific phenotypic differences. In contrast, some gene sets targeted by different transcription factors and microRNAs were also overexpressed in the YRI females, though further functional validation of these genes are necessary to clarify their biological functions.

The GSEA curated gene sets were collected from various sources such as Gene Ontology [23], and knowledge of domain experts [11]. For both populations, no gene sets were overrepresented in males (Supplemental Table 4). Not surprisingly, the set containing genes that escape X-inactivation "DISTECHE_XINACTIVATED_GENES" was overrepresented in the female samples. In addition, many cancer or disease-related gene sets were enriched in the YRI females. The most significant overrepresented gene set "CANCER_NEOPLASTIC_META_UP" (Padjusted = 0.03) is comprised of 67 genes commonly upregulated in cancer [24]. Our observation that gender-specific expression patterns might also differ between the two populations could provide new insights into the relationship between gene expression and ancestry-related health disparities [25]. Though our findings suggest the possible mechanisms responsible for phenotypic gender differences (e.g. through certain transcription factors and microRNAs) and provide a list of overrepresented or underrepresented gene sets, a systems biology approach will be necessary to illustrate the complete picture of the interplay of these gene sets or pathways in the future.

One limitation of this study is that the LCLs represent only one tissue type, while many pharmacology- or health-related genes have distinct tissue-specific expression patterns (e.g. UGT1A isoforms [26]). Therefore, comparing expression patterns in other tissues could provide a more comprehensive view of gender differences and could be more relevant to specific diseases. Notably, because the CEU samples were collected and transformed decades earlier than the YRI samples, the bias due to cell line collection time could be a potential confounding factor when interpreting the results of population differences. In addition, there are some limitations of the GSEA algorithm such as those proposed in Dinu et al. [27]. We expect that with the advances of the GSEA tools and our knowledge of gene function, novel enriched gene sets between genders may be discovered in the near future.



Acknowledgements

The authors wish to thank Dr. Nancy Cox for helpful discussion. This Pharmacogenetics of Anticancer Agents Research Group study was supported by NIH/NIGMS grant GM61393 and NIH/NCI Breast SPORE P50 CA125183. Data deposits associated with this research are supported by GM61374. The authors are grateful to T. A. Clark, T. X. Chen, A. C. Schweitzer and J. E. Blume (Expression Research, Affymetrix Laboratory, Affymetrix Inc., Santa Clara, CA 94706) for their contribution in generating the Exon Array data.



References


  1. Federman, D. D. (2006). The biology of human sex differences. N. Engl. J. Med. 354, 1507-1514.

  2. Kaminsky, Z., Wang, S. C. and Petronis, A. (2006). Complex disease, gender and epigenetics. Ann. Med. 38, 530-544.

  3. Rivera, M. P. and Stover, D. E. (2004). Gender and lung cancer. Clin. Chest Med. 25, 391-400.

  4. Hensler, P. (2006). Viewpoint: Gender differences in heart disease. Circulation 114, f170-f171.

  5. Huang, R. S., Kistner, E. O., Bleibel, W. K., Shukla, S. J. and Dolan, M. E. (2007). Effect of population and gender on chemotherapeutic agent-induced cytotoxicity. Mol. Cancer Ther. 6, 31-36.

  6. Morley, M., Molony, C. M., Weber, T. M., Devlin, J. L., Ewens, K. G., Spielman, R. S. and Cheung, V. G. (2004). Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743-747.

  7. Duan, S., Huang, R. S., Zhang, W., Bleibel, W. K., Roe, C. A., Clark, T. A., Chen, T. X., Schweitzer, A. C., Blume, J. E., Cox, N. J. and Dolan, M. E. (2008). Genetic architecture of transcript-level variation in humans. Am. J. Hum. Genet. 82, 1101-1113.

  8. Zhang, W., Ratain, M. J. and Dolan, M. E. (2008). The HapMap resource is providing new insights into ourselves and its application to pharmacogenomics. Bioinform. Biol. Insights 2, 15-23.

  9. Paine, M. F., Ludington, S. S., Chen, M.-L., Stewart, P. W., Huang, S.-M. and Watkins, P. B. (2005). Do men and women differ in proximal small intestinal CYP3A or P-glycoprotein expression? Drug Metab. Dispos. 33, 426-433.

  10. Zhang, W., Bleibel, W. K., Roe, C. A., Cox, N. J. and Dolan, M. E. (2007). Gender-specific differences in expression in human lymphoblastoid cell lines. Pharmacogenet. Genomics 17, 447-450.

  11. Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. and Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545-15550.

  12. Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstråle, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D. and Groop, L. C. (2003). PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267-273.

  13. Nam, D. and Kim, S.-Y. (2008). Gene-set approach for expression pattern analysis. Brief. Bioinform. 9, 189-197.

  14. Hui, J., Hung, L.-H., Heiner, M., Schreiner, S., Neumüller, N., Reither, G., Haas, S. A. and Bindereif, A. (2005). Intronic CA-repeat and CA-rich elements: a new class of regulators of mammalian alternative splicing. EMBO J. 24, 1988-1998.

  15. Zhang, W., Duan, S., Kistner, E. O., Bleibel, W. K., Huang, R. S., Clark, T. A., Chen, T. X., Schweitzer, A. C., Blume, J. E., Cox, N. J. and Dolan, M. E. (2008). Evaluation of genetic variation contributing to differences in gene expression between populations. Am. J. Hum. Genet. 82, 631-640.

  16. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 289-300.

  17. Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440-9445.

  18. Idaghdour, Y., Storey, J. D., Jadallah, S. J. and Gibson, G. (2008). A genome-wide gene expression signature of environmental geography in leukocytes of Moroccan Amazighs. PLoS Genet. 4, e1000052.

  19. Carrel, L. and Willard, H. F. (2005). X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 434, 400-404.

  20. Calaf, G. M. and Roy, D. (2007). Human drug metabolism genes in parathion-and estrogen-treated breast cells. Int. J. Mol. Med. 20, 875-881.

  21. Nguyen, A., Jing, Z., Mahoney, P. S., Davis, R., Sikka, S. C., Agrawal, K. C. and Abdel-Mageed, A. B. (2000). In vivo gene expression profile analysis of metallothionein in renal cell carcinoma. Cancer Lett. 160, 133-140.

  22. Xie, X., Lu, J., Kulbokas, E. J., Golub, T. R., Mootha, V., Lindblad-Toh, K., Lander, E. S. and Kellis, M. (2005). Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434, 338-345.

  23. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M. and Sherlock, G. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25-29.

  24. Rhodes, D. R., Yu, J., Shanker, K., Deshpande, N., Varambally, R., Ghosh, D., Barrette, T., Pandey, A. and Chinnaiyan, A. M. (2004). Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Natl. Acad. Sci. USA 101, 9309-9314.

  25. Zhang, W. and Dolan, M. E. (2008). Ancestry-related differences in gene expression: findings may enhance understanding of health disparities between populations. Pharmacogenomics 9, 489-492.

  26. Zhang, W., Liu, W., Innocenti, F. and Ratain, M. J. (2007). Searching for tissue-specific expression pattern-linked nucleotides of UGT1A isoforms. PLoS ONE 2, e396.

  27. Dinu, I., Potter, J. D., Mueller, T., Liu, Q., Adewale, A. J., Jhangri, G. S., Einecke, G., Famulski, K. S., Halloran, P. and Yasui, Y. (2007). Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 8, 242.