Genome-wide prediction and analysis of function-specific transcription factor binding sites
Fan Long1*, Hong Liu1, Chang Hahn1, Pavel Sumazin2, Michael Q. Zhang3 and Asher Zilberstein1
1 Aventis Pharmaceuticals Inc., Route 202/206, Bridgewater, NJ 08807 USA
DNA-binding transcription factors play a central role in transcription regulation, and the annotation of transcription-factor binding sites in upstream regions of human genes is essential for building a genome-wide regulatory network. We describe methodology to accurately predict the transcription-factor binding sites in the proximal-promoter region of function-specific genes. In order to increase the accuracy of transcription factor binding-site prediction, we rely on recent genome sequence data, known transcription factor binding-site matrices, and Gene Ontology biological-function-based gene classification. Using TRANSFAC position-frequency matrices, we detected individual and cooperating transcription-factor binding sites in proximal promoters of ENSEMBL annotated human genes. We used the over representation of detected binding sites in the proximal promoters as compared to the second exons to control specificity. We confirmed the majority of transcription-factor binding sites predicted in proximal promoters of immune-response genes with evidence from existing literature. We validated the predicted cooperation between transcription factors NF-κB and IRF in the regulation of gene expression with microarray transcript profiling data and literature-derived protein-protein interaction network. We also identified over-represented individual and pairs of transcription-factor binding sites in the proximal promoters of each Gene Ontology biological-process gene group. Our tools and analysis provide a new resource for deciphering transcription regulation in different biological paradigms.
Key words: transcription regulation, transcription factor binding site, Gene Ontology